Skip to content

Fingerprinting for evasion: how malware checks if it's in a sandbox

· 19 min read
Copyright: MIT
Wordmark reading SANDBOX? with a branching dormant/detonate decision arrow

A malware analyst and a scraper-blocking vendor are asking the same question from opposite chairs. The vendor wants to know: is this client a real browser driven by a real human, or an automated bot wearing a browser’s clothes? The malware wants to know: is this machine a real victim’s desktop, or an automated sandbox wearing a victim’s clothes? Both questions get answered by fingerprinting the environment and scoring it. The only thing that flips is who benefits from a “no.”

When an anti-bot system decides a client is automated, it blocks. When malware decides its host is automated, it does the opposite of what you’d expect from a program that wants to run: it goes quiet. It sleeps, it exits clean, it never unpacks the second stage, and the analyst’s report comes back saying the sample did nothing interesting. That dormancy is the whole game. A sandbox that watches a sample do nothing learns nothing, and a sample that learns it’s in a sandbox does nothing on purpose. This post is about the signals it reads to make that call.

The sections below walk the fingerprint surface roughly in the order malware tends to check it. First the CPU itself, where a single instruction and a stopwatch expose the hypervisor. Then the artifacts a virtual machine leaves lying around, the registry keys and device names and MAC prefixes. Then the part that looks most like bot detection: human-presence checks built on mouse geometry, uptime, and screen resolution. Then the named-and-shamed list of sandbox hostnames. Finally, what defenders do about all of it, and why this is the mirror image of the bot-detection arms race we cover elsewhere on this blog.

The CPU tells on the hypervisor

The cleanest signal lives in silicon. Intel and AMD reserve a bit in the CPUID instruction’s output specifically to announce a hypervisor: CPUID leaf 1 returns a feature bitmap in ECX, and bit 31 (CPUID.1:ECX.HV) is set when the processor is running under virtualization. It costs one instruction to read. On a bare-metal install the bit is clear; under VMware, VirtualBox, KVM, or Hyper-V it is usually set, because the hypervisor wants guest operating systems to know they’re guests.

That bit is honest by design, which makes it the first thing a serious sandbox spoofs. So malware rarely trusts it alone. The more reliable check measures time, and it leans on a quirk of how virtualization works at all. Certain instructions cannot be allowed to run unsupervised inside a guest, because their results would leak the host’s real state or let the guest escape its box. CPUID is one of them. When a guest executes CPUID, the processor performs a VM exit: it freezes the guest, hands control to the hypervisor, lets the hypervisor emulate the instruction, and resumes the guest. On bare metal that same instruction is a handful of cycles. Under a hypervisor it is a context switch.

You can time the gap. The RDTSC instruction reads the CPU’s timestamp counter, a monotonically increasing cycle count. Sandwich a VM-exiting instruction between two reads and subtract. The canonical sequence is rdtsc; cpuid; rdtsc, and the delta it produces is small on a physical CPU and conspicuously large inside a VM because of the exit-and-return overhead. The same rdtsc; cpuid; rdtsc idiom shows up in game anti-cheat code and in malware, for the identical reason.

rdtsc ; cpuid ; rdtsc the same three instructions, two environments, very different deltas

bare metal T0 cpuid T1 T1 - T0 ~ tens of cycles

inside a VM T0 VM exit -> hypervisor emulates cpuid -> VM resume T1 T1 - T0 ~ hundreds to thousands of cycles

the malware never needs the cpuid result; it only times the trap The timing attack doesn’t care what CPUID returns. The privileged instruction is bait; the stopwatch is the sensor, and the VM exit is what it measures.

A single measurement is noisy, so real samples amplify it. GuLoader, a downloader that has been a reliable showcase for anti-VM tradecraft, runs the timing loop many thousands of times and sums the deltas. One analyzed variant accumulates RDTSC-bracketed CPUID executions across 0x186A0 iterations, which is 100,000 in decimal, with LFENCE barriers to stop the processor reordering the reads, and a bit test BT ECX, 1Fh to read the hypervisor bit along the way. If the accumulated overhead crosses 0x68E7780, roughly 1.77 billion cycles, the sample decides it’s virtualized and enters an infinite loop instead of running its payload. A different GuLoader build hammers RDTSC and CPUID around 11 million times. On VirtualBox that loop burned several minutes; on bare metal it finished in under a second. The sample never has to decide it’s slow. The slowness is the answer.

Blitz, a 2024 family redistributed inside backdoored Standoff 2 game cheats, uses the same shape with different numbers. Its anti-sandbox routine spins a 1,000,000-iteration loop while a second thread counts floating-point instructions (FYL2XP1), with CPUID on the main thread for synchronization, and bails if a final ratio crosses a threshold of 5.0. The instruction choices vary; the principle does not. Find an operation that is cheap on hardware and expensive under emulation, do it enough times that the gap dwarfs the noise, and read the clock.

There is a subtler variant that doesn’t even need a loop. The trap flag (TF) is bit 8 of the EFLAGS register; set it, and the CPU raises a single-step debug exception after the next instruction retires. The trick is to set TF and then execute a VM-exiting instruction like CPUID, RDTSC, or IN. On bare metal the single-step exception lands on the instruction immediately after the trapping one. On some hypervisors, the emulation of the trapping instruction mishandles the pending trap, and the exception lands one instruction too late. Lampion checked exactly this: inside its exception handler it read the saved instruction pointer (EIP) from the CONTEXT record and compared it against the address of a planted NOP (0x90) to see whether the single step had been delivered where physical hardware would deliver it. One bit, one mishandled exception, one verdict. Palo Alto’s Unit 42 wrote this up in 2021, and the elegance is that it sidesteps the whole timing-noise problem.

The artifacts a virtual machine forgets to hide

Timing is the precise tool. The blunt one is just looking around the room. A virtual machine and its guest-additions software scatter recognizable strings across the registry, the filesystem, device namespace, and network configuration, and decades of malware have hard-coded lists of them. This is the bulk of what tools like Pafish (Alberto Ortega’s open-source “paranoid fish,” archived in 2025 but still the reference probe) and al-khaser enumerate, and what defenders use the same tools to find and scrub.

VirtualBox is the loudest. Its guest additions register devices at \\.\VBoxGuest and \\.\VBoxMiniRdrDN, run services and processes named vboxservice.exe and vboxtray.exe, drop DLLs like vboxdisp.dll and vboxhook.dll, and leave registry trails under SOFTWARE\Oracle\VirtualBox Guest Additions and SYSTEM\ControlSet001\Services\VBoxGuest. VMware is similar: vmtoolsd.exe, vmwaretray.exe, VGAuthService.exe, and manufacturer strings in SYSTEM\ControlSet001\Control\SystemInformation. Sandboxie announces itself through sbiedll.dll loaded into the process. QEMU’s guest agent lives at predictable paths; GuLoader specifically checks for C:\Program Files\Qemu-ga\qemu-ga.exe and C:\Program Files\qga\qga.exe.

What the hypervisor leaves behind platform process MAC prefix (OUI)

VirtualBox vboxservice.exe 08:00:27

VMware vmtoolsd.exe 00:05:69 00:0C:29 00:1C:14 00:50:56

Parallels prl_tools.exe 00:1C:42

Xen xenservice.exe 00:16:3E

QEMU qemu-ga.exe (e1000 NIC signature)

Sandboxie sbiedll.dll loaded n/a

MAC prefixes are vendor-assigned OUIs; a NIC starting 08:00:27 is almost certainly VirtualBox *A condensed slice of what Pafish and al-khaser enumerate. None of these strings is hard to find, which is exactly why mature sandboxes rename their devices and randomize their MACs.*

The MAC address is its own giveaway. The first three octets of a MAC are an Organizationally Unique Identifier assigned to a hardware vendor, and virtualization platforms ship with assigned blocks. A NIC whose MAC begins 08:00:27 is VirtualBox; 00:05:69, 00:0C:29, 00:1C:14, and 00:50:56 are VMware; 00:1C:42 is Parallels; 00:16:3E is Xen. Reading the adapter’s MAC and matching the first three bytes against a table takes a few lines of code and no special privilege. Disk artifacts work the same way: volume serial numbers and SCSI identifier strings often carry the emulator’s name, and HARDWARE\DEVICEMAP\Scsi entries spell out “VBOX” or “QEMU” in plain text. Zebrocy, among others, checked volume serials against known emulator signatures.

The firmware layer offers the same kind of plaintext. The SMBIOS tables that describe the machine’s manufacturer, product name, and BIOS version are readable from a guest, and a default VMware or VirtualBox image fills them with the hypervisor’s own brand. SystemManufacturer reads “VMware, Inc.”, the BIOS version string carries “VBOX” or “Xen”, and the embedded ACPI tables name the emulated chipset. A sandbox that scrubs the registry but forgets the firmware strings still answers the question for the malware. This is why mature analysis platforms patch SMBIOS to mimic a Dell or Lenovo consumer machine, serial number included, rather than leaving the factory defaults in place.

There is one more class of artifact that doesn’t live in any string table: the instruction set itself behaving oddly. Some hypervisors trap and emulate instructions that bare-metal CPUs execute silently, and a few legacy tricks read invalid or reserved instructions and watch how the environment reacts. The VMware “backdoor” interface is the textbook case. VMware exposes a communication channel through the IN instruction against I/O port 0x5658 with the magic value 0x564D5868 (“VMXh”) in EAX; on a real machine that I/O access faults, but inside VMware it returns the magic value back, which is a one-shot confirmation that the tools-aware hypervisor is listening. The check is old and well-known, which means it’s also well-spoofed, but it still appears in commodity loaders because it’s cheap and it still works against a sloppy setup.

Then there is the hardware-shape check, which doesn’t look for any specific string at all. Real consumer machines in 2026 have a lot of cores, a lot of RAM, a big disk that’s been written to, and a high-resolution display. Throwaway analysis VMs are provisioned thin. So malware reads the machine’s specs and treats anything austere as suspicious. Blitz terminates if the host reports fewer than four CPU cores. TrickBot, back in 2020, checked the screen resolution and quit on 800x600 or 1024x768, because those are the default resolutions a Windows VM falls back to when no guest-additions display driver is installed, and analysts routinely skip installing it to avoid tripping other VM checks. Blitz extended the same idea to 1024x768, 800x600, and 640x480. The reasoning is statistical, not forensic. A machine this small and this clean is probably disposable, and a disposable machine is probably watching.

The part that looks exactly like bot detection

Here the mirror with anti-bot systems gets sharp. A sandbox is automated, which means by default nobody is touching the mouse, nobody is typing, the screen has been at one resolution since boot, and the machine came up sixty seconds ago. Malware reads those same human-presence signals that a fraud-detection vendor reads, only it reads them to confirm a human is absent. If the room is empty, it leaves before doing anything worth recording.

The simplest version is dwell. Windows exposes the time since the last user input through GetLastInputInfo, and comparing it against GetTickCount tells you how long the machine has gone untouched. A long idle stretch on a freshly booted host is a strong sandbox tell, so some samples simply sleep through the typical analysis window, or refuse to act until they observe a click. The DarkHotel APT polled the cursor position repeatedly to check whether it was actually moving; the Tinba banking trojan waited for mouse movement and screen activity before it would proceed at all. A sandbox that doesn’t simulate input never clears the gate.

Uptime is the same idea read from a different clock. An analysis VM is reverted to a clean snapshot before each run, so when the sample executes, the machine has typically been “alive” for seconds to a couple of minutes. A real victim’s desktop has usually been up for hours or days, with the system clock, event logs, and tick counter all reflecting that history. GetTickCount and GetTickCount64 report milliseconds since boot, and a value under a few minutes is a cheap red flag. The defensive countermeasure is obvious and widely deployed: roll the snapshot’s clock forward, age the uptime counter, and let the event log accumulate plausible history before detonation. Some malware responds by cross-checking. If the reported uptime is long but the Documents folder is empty, the browser cache holds no pages, and the recent-files list is bare, the long uptime is a lie, and the contradiction itself is the signal. This “pocket litter” check, the count of files a real user would have generated by now, is harder to fake convincingly than a single clock value, because it requires the sandbox to manufacture a believable life rather than tweak one number.

The time-based family also includes the patient approach: just outlast the analyst. Sandboxes run on a budget, often two to five minutes per sample, because the queue is long and compute is finite. Malware that sleeps past that window, or buries its real work behind a delay loop long enough to exceed it, gets a clean report by default. Crude Sleep calls get patched out by sandboxes that fast-forward timers, so samples moved to harder-to-skip delays: SetTimer with a message callback, NtDelayExecution, or the brute-force RDTSC/CPUID hammering we already saw, which doubles as both a timing probe and a stall. MITRE catalogs this as the T1497.003 time-based sub-technique, alongside the T1497.001 system checks and T1497.002 user-activity checks that cover everything else in this post.

The most quoted modern example is LummaC2, an information stealer whose version 4.0 added a mouse-geometry check that reads like something out of a graphics textbook. Outpost24’s KrakenLabs team documented it, and the technique was first reported in November 2023. The sample calls GetCursorPos to grab an initial cursor location, then loops with a 300-millisecond sleep until the position changes. Once it sees movement it records five consecutive positions, GetCursorPos again with a 50-millisecond pause between each. From those five points (P0 through P4) it builds four vectors, computes the angle between each consecutive pair using the dot product and Euclidean magnitudes, converts radians to degrees, and compares every angle against a hard-coded threshold of 45.0 degrees. If all four angles come in under 45 degrees the movement is smooth enough to call human, and the malware proceeds. If any angle exceeds the threshold, it throws the sample away and starts the whole capture over.

LummaC2 v4.0: five cursor samples, four vectors, one threshold P0 P1 P2 P3 P4

angle < 45 deg = smooth

all four angles under 45 deg -> proceed | any angle over 45 deg -> discard, recapture *Smooth human travel produces shallow angles between successive motion vectors; a sandbox that teleports the cursor or jitters it randomly produces sharp ones. The geometry is the Turing test.*

A scraper builder will recognize the shape of this instantly, because it’s the same logic anti-bot vendors apply to mouse traces. Our write-up on why a real mouse path is hard to fake covers the human side of the same physics: real pointer motion obeys the speed-accuracy tradeoff of Fitts’s law and arrives in smooth, slightly-overshooting curves, while synthetic motion tends to be too linear, too uniform, or too sharp. LummaC2 just inverts the verdict. Where a fraud system flags the smooth curve as the human and the jagged one as the bot, the malware treats the smooth curve as its cue to detonate and the jagged or absent one as its cue to hide. The maps are identical; only the sign on the output flips.

This convergence is the through-line of the post. The literature on synthesizing human-like input events exists because scrapers need to defeat exactly these mouse checks, and the moment a sandbox vendor adds realistic cursor simulation to detonate evasive samples, that vendor is solving the scraper’s problem for it. Tooling crosses the line in both directions. Captcha-gated malware is the same borrowing in the other direction, with threat actors lifting anti-bot challenge pages to keep crawlers and sandboxes away from their payloads.

Known by name: the hostname and username blocklists

Some checks need no cleverness at all, just a list. Automated sandboxes get deployed at scale from images, and those images carry generic, repeated identities. Reverse engineers have harvested the recurring ones, and malware now ships hard-coded blocklists of sandbox hostnames and usernames. Read the machine’s name with GetComputerName, read the account with GetUserName, lowercase, and match. The Unprotect Project’s catalog of these strings includes hostnames and usernames like SANDBOX, MALTEST, TEQUILABOOMBOOM, VIRUS, MALWARE, John Doe, WDAGUtilityAccount (the account Windows Defender Application Guard runs under), 7SILVIA, WIN7-TRAPS, and a couple dozen more. The CAPA rule that detects this behavior catalogs around thirty username patterns, observed across families including Emotet, Trickbot, Gootkit, and Shifu.

The blocklist approach is brittle by definition, which is its own tell. The names only work until analysts stop using them, so the list is a snapshot of yesterday’s sandbox hygiene rather than a durable signal. But it’s free, and on a poorly maintained sandbox it works on the first instruction. TEQUILABOOMBOOM in particular has a long history as an Anubis/Sandbox default, and the fact that it still appears in malware checks years later tells you how slowly defaults get changed across the long tail of analysis environments.

Driver and product names land in the same bucket. Blitz checks for ANY.RUN’s device driver, the string \\?\A3E64E55_fl, alongside its other registry and timing checks. ANY.RUN is a popular interactive cloud sandbox, and its driver name is a stable enough artifact to hard-code. The lesson cuts both ways: any analysis tool that leaves a unique, stable name in the device namespace becomes a fingerprint, and the more popular the tool, the more worthwhile it is for malware to special-case it.

The verdict: dormant on purpose

What ties the surface together is the scoring step, and here malware in 2025 starts to look like the anti-bot systems it mirrors. Older samples were binary: any single VM artifact meant abort. Newer ones collect many weak signals, core count and uptime and recent file activity and mouse behavior, and combine them into a suspicion score, sometimes feeding the lot into a small model rather than a fixed if ladder. A high score means analyzed, and analyzed means do nothing. The exact weighting is rarely public; it’s recovered by reverse engineers reading the binary, and it differs sample to sample.

Collect, score, then decide whether to exist signals cpuid / rdtsc delta VM artifacts mouse geometry uptime / idle hostname match suspicion score looks analyzed -> dormant sleep, exit clean, never unpack looks real -> detonate unpack stage 2, contact C2

the default action under uncertainty is silence; a quiet sample wastes the analyst’s clock The economics favor the malware. A sandbox that observes nothing produces a clean report; the cost of a false “I’m being watched” is a few minutes of lost infections, paid back many times over in samples that slip through analysis.

Defenders fight this on two fronts. The first is making the sandbox honest, or at least convincingly dishonest. Bare-metal and hardware-assisted analysis platforms remove the obvious timing tell by not being a conventional VM at all. Vendor sandboxes spoof the CPUID hypervisor bit, rename their devices, randomize MAC addresses, age their disk images, scatter believable “pocket litter” of documents and browser history, and simulate cursor movement and clicks so the LummaC2-style geometry check sees a plausible human. Palo Alto’s Unit 42 built a custom hypervisor specifically so that its instrumentation is invisible from inside the guest, because any visible hook is a signal. The second front is static: pull the sample apart without running it, so the evasion logic never executes, which is why YARA rule libraries and tools like Pafish and al-khaser exist to catalog the artifact strings and flag the checks in the binary directly.

It’s a closed loop, and it runs on the same fuel as the bot-detection arms race. Each environmental signal a sandbox spoofs is a signal malware stops trusting, which pushes malware toward signals that are expensive to fake, which pushes sandbox vendors to fake them anyway, which is the exact dynamic we describe in our anti-bot coverage. The mouse-geometry check is the clearest evidence. When LummaC2 borrowed the Euclidean-vector trick to prove a human is present, it borrowed it from the same well that fraud-detection vendors and stealth-scraper authors have been drinking from for a decade. The technical question, is this environment driven by a person or a script, has exactly one answer surface, and both sides read it. They just disagree about which answer is the good news.


Sources & further reading

Further reading