JavaScript VM obfuscation: hiding logic in a bytecode interpreter
Pretty-print a heavily obfuscated anti-bot script and you usually win. The variable names are garbage, the strings are encrypted, the control flow is flattened into a switch statement, but the operations are still JavaScript operations. A + is a +. A property read is a property read. With enough patience and a deobfuscator you can fold the constants, undo the flattening, and read what the code does. Then you hit a script where that fails completely. You pretty-print it and what comes back is a small loop reading bytes out of a giant array, and a table of forty or sixty tiny functions, none of which does anything you recognise. The logic you wanted is not in the code. It is in the data the loop is chewing through.
That is virtual machine obfuscation, and it is the reason a Kasada or a reverse-engineer’s worst afternoon looks the way it does. The technique borrows directly from software protection on native binaries: take the code you want to hide, compile it down to a custom instruction set that exists only inside this one program, and ship an interpreter for that instruction set alongside the bytecode. The interpreter is readable. The bytecode is not, because it means nothing without first reconstructing the instruction set it was written against. This post is about how that works, why it is the strongest of the common JavaScript obfuscation techniques, and what it takes to undo it.
The route through is roughly the order an analyst meets the problem. First, where VM obfuscation sits relative to the cheaper techniques it builds on. Then the anatomy of an interpreter: the dispatcher, the virtual program counter, the handler table, the fetch-decode-execute loop. Then how a source-to-bytecode compiler turns ordinary code into something that loop can run, and the choice between stack and register machines. Then the things that make a real-world VM nastier than the textbook one, including polymorphism and per-build opcode shuffling. Then devirtualization at a conceptual level: how you reverse the interpreter, recover the opcode semantics, and lift the bytecode back to something legible. A closing section on where the technique is genuinely strong and where the cost falls.
Where VM obfuscation sits in the stack
Most JavaScript obfuscation is a sequence of source-to-source transforms. The popular open-source tool, javascript-obfuscator, is a good map of the cheaper layers because it documents each one. String literals get pulled into a single array, often RC4- or base64-encoded, and replaced with index lookups through a wrapper function, so "submit" becomes something like _0x3a1b(0x4f). Names get mangled to hex soup. Dead code gets injected to pad the reading surface. Numbers get rewritten as arithmetic. None of this changes what the code does; it changes how much work it is to read.
Control flow flattening is the strongest of those source-level transforms, and it is the conceptual stepping stone to virtualization, so it is worth being precise about. A normal function runs its statements in order. Flattening destroys that order. It chops the body into basic blocks, drops them all into the cases of one big switch, and wraps the switch in a loop driven by a state variable. Each block ends by setting the state variable to the number of the block that should run next, then loops back to the dispatcher. The official documentation shows the shape clearly: three sequential calls become a while(true) around a switch indexed by a state array.
Flattening hurts a human reader and confuses naive decompilers, but it has a ceiling. The blocks are still real JavaScript. Recover the state transitions, sort the cases back into order, and the original program reappears. Modern deobfuscators automate exactly that, which is why flattening alone stopped being a serious obstacle some years ago. Virtualization is what you reach for when source-level transforms are not enough. It does not rearrange JavaScript. It stops being JavaScript.
The anatomy of an interpreter
A virtualized program has two parts that travel together: the bytecode, which is the hidden logic encoded as a flat array of numbers, and the interpreter, the readable JavaScript that walks that array and acts on it. The interpreter is built around a loop. It reads a number out of the bytecode, treats that number as an opcode, looks up the function that implements that opcode, and calls it. Then it does it again. This is the fetch-decode-execute cycle, lifted straight from how a physical CPU runs machine code, which is also why the literature on native code virtualization reads almost unchanged when applied to JavaScript.
Four pieces show up in essentially every implementation, and they have stable names across both the native and the JavaScript work. There is a virtual program counter, a pointer into the bytecode that says which instruction comes next. There is the dispatcher, the loop that does the fetching and routing. There is a handler table, an array of small functions where index N implements opcode N. And there is the VM state, the virtual registers or virtual stack the handlers read from and write to, kept separate from the real JavaScript stack. Tim Blazytko’s write-up on building disassemblers for these systems names the same four under slightly different labels, a virtual instruction pointer and a virtual stack pointer threading through a dispatcher and its handlers, and the academic work going back to Rolf Rolles’ 2009 USENIX WOOT paper “Unpacking Virtualization Obfuscators” describes the identical fetch-decode-execute structure with a virtual program counter and a handler table.
*The interpreter is small and readable. The hidden program is the byte stream it reads; opcode 0x65 here routes to the ADD handler, but that mapping is arbitrary and changes per build.*The dispatcher itself is almost trivial. In Johannes Willbold’s walkthrough of a JavaScript virtualizer, the whole loop is two lines: fetch a byte, index the ops table with it, call the handler with the VM as argument. His example pins one concrete mapping, where the byte value 101 means ADD, and the ADD handler reads three more bytes for a destination and two source registers, then writes the sum back into the destination register. That last detail is the important one. Opcodes carry operands, and the operands are more bytes pulled from the same stream by the handler, not by the dispatcher. The dispatcher does not know or care how many bytes any given instruction consumes. Each handler advances the program counter by exactly as much as it needs. This is why you cannot disassemble the bytecode by chopping it into fixed-width rows. You have to know every handler’s appetite before the byte stream parses at all, and that is the whole point.
The number-to-handler mapping is the secret. There is nothing universal about 101 meaning ADD. In a different build it might be 12, or 200, and ADD might not be a single opcode at all. The handler table is generated fresh, the opcode numbers assigned arbitrarily, and the bytecode emitted to match. Two scripts protecting identical source can share zero opcode values. An analyst’s first job is always to rebuild that table for this specific script, by hand or with tooling, because nothing carries over from the last one.
From source to bytecode
The other half of a virtualizer is the compiler that runs at protection time, before anything ships. It takes the function you want to hide, parses it to an abstract syntax tree the same way any JavaScript engine would, and then, instead of executing the tree, walks it and emits opcodes. An addition node becomes an ADD opcode plus operands. A variable read becomes a LOAD. An if becomes a comparison followed by a conditional jump that sets the virtual program counter to one of two offsets. A function call becomes a CALL opcode that bridges back out to the real JavaScript runtime, because the VM still has to touch the actual DOM, the actual navigator object, the actual fetch. The output is a flat array of numbers plus the handler table that interprets them, and the original AST is discarded.
A design decision sits underneath this: stack machine or register machine. A stack machine keeps its working values on a virtual stack and most opcodes are implicit about where their operands live, PUSH a constant, PUSH a variable, ADD pops two and pushes the sum. The bytecode is compact and the handlers are simple, which is why a lot of bytecode VMs, including production JavaScript engines, lean this way. A register machine instead names its operands explicitly, like the Willbold example where ADD spells out a destination and two sources. Register bytecode is bulkier per instruction but uses fewer instructions overall, and the explicit operands can make a naive trace harder to follow because the data flow hides in register indices rather than in obvious stack order. The Nike.com protection that umasi devirtualized in early 2023 used a register-style design with roughly sixty atomic handlers and an instruction pointer kept at a fixed slot in the VM state object, which is a representative shape for a serious anti-bot VM.
Whatever the architecture, the bytecode has to arrive as data, and that is its own layer of hiding. It rarely ships as a readable array of integers. It shows up as a long string in a custom encoding that the script decodes at load time into the numeric array the dispatcher will read. The Nike VM string ran to around 386,000 characters in a custom base-50 encoding, with a slice of it XOR’d to hold the string literals the program needed. So before the dispatcher runs a single instruction, the script has already turned an innocuous-looking blob into a number array, and the analyst has to find and reverse that decode step just to see the bytes the VM actually executes.
If you want to see the technique in its honest, non-adversarial form, the documentation for the commercial obfuscator.io VM mode states the goal plainly: it transforms functions into custom bytecode running on an embedded virtual machine so that, in their words, the original logic is completely hidden and there is no JavaScript left to reverse-engineer. That is the selling point and also the accurate description. The readable artifact is the interpreter, and the interpreter tells you how to run the bytecode without telling you what the bytecode does.
What makes a real-world VM worse than the textbook
The clean four-part model is where you start. A production anti-bot VM adds friction on top of it, and the friction is where the time goes.
The first multiplier is that virtualization composes with everything below it. The bytecode is opaque, but the interpreter is still JavaScript, so the interpreter itself gets the full cheaper-obfuscation treatment: its strings encrypted, its own control flow flattened, its handlers split and shuffled, dead branches injected between the real ones. The obfuscator.io documentation is explicit that control flow flattening and VM mode stack, with the flattened state machine itself further compiled into bytecode when the VM is enabled. So you can face a flattened dispatcher driving a handler table whose individual handlers are themselves flattened, sitting inside a script where the bytecode is encrypted and the string constants are decrypted microseconds before use. Each layer is individually beatable. Stacked, they multiply the work rather than adding to it.
The second multiplier is polymorphism, and this is the one that breaks the economics for an attacker. A serious virtualizer does not emit the same VM twice. Every build, and in the anti-bot case sometimes every page load or every tenant, reshuffles the opcode-to-handler assignment, renames everything, reorders the handler table, and re-encodes the bytecode. Kasada’s client, the script practitioners know as ips.js, is the textbook case: independent analyses describe a custom JavaScript VM with polymorphic obfuscation that mutates the code structure, telemetry collection compiled into bytecode, strings decrypted in memory just before use, and dead code injected throughout. The practical effect is that an analyst who fully devirtualizes today’s build has reverse-engineered today’s build and not the technique. The opcode table they recovered is worthless against tomorrow’s, because tomorrow 0x65 is no longer ADD. Reports on Kasada specifically note that emulation-based approaches break within days as the bytecode rotates. That gap, between understanding one instance and defeating the system, is the entire design intent, and it is the same reason native VM protections like VMProtect and Themida shuffle their handlers between builds.
There is a defensive payoff worth naming, because it is why these systems exist on the anti-bot side at all. The whole anti-bot client lives in hostile territory by definition; it runs on the attacker’s machine, in the attacker’s browser, under the attacker’s debugger. Virtualization is the most effective way to keep the fingerprinting and proof-of-work logic from being read off the wire and trivially reimplemented in a fast HTTP client. The same instinct shows up in the broader move to push detection logic out of readable JavaScript entirely, whether into a WebAssembly module or into the kind of runtime-fingerprinting checks that are hard to fake from outside a real browser. VM obfuscation is the JavaScript-native answer to the same pressure. For the full anti-bot context, the Kasada KPSDK breakdown covers how the VM feeds the actual tokens; this post stays on the obfuscation technique itself.
Devirtualization, conceptually
Reversing a VM is mechanical once you accept that it is two problems, not one. The interpreter is a static-analysis problem. The bytecode is a disassembly problem that depends on the answer to the first. You cannot read the program off the byte stream until you know what each opcode does, and you only learn that from the interpreter. So the work runs in a fixed order, and it is the same order whether the target is FinSpy’s native VM, a Tigress challenge binary, or a JavaScript anti-bot script.
The starting move is to find the dispatcher, because everything else hangs off it. Structurally it is the obvious hot spot: a tight loop that reads from one pointer, indexes a big table, and fans out to many small functions. In control-flow terms the dispatcher block has an unusually high number of successors, since it can jump to any handler. A 2026 paper from Chungnam National University on statically detecting Tigress VM structures with an LLVM pass keys on exactly that signal, identifying the dispatch start as the basic block with the highest successor count and the handler blocks as its successors. The same heuristic works whether you are staring at disassembly or at pretty-printed JavaScript: the loop that touches every handler is the loop you want.
*The pipeline is the same for a native VM or a JavaScript one. Polymorphism is what makes step four worthless on the next build, because steps one through three have to be redone.*With the dispatcher located, you recover the opcode semantics one handler at a time. The manual way is to read each handler and write down what it does to the VM state, which for forty handlers is tedious but finite. The scalable way is symbolic execution, and this is where the field has matured. Blazytko’s approach builds a symbolic executor that follows the VM from entry to exit, and at each handler it captures the data-flow relationship symbolically rather than by hand, so the analyst is incrementally teaching the tool about the bytecode regions and input constraints instead of transcribing forty functions. A 2024 Thalium write-up pushes the same idea further by tainting the obfuscated function’s inputs, splitting execution traces at input-dependent branches to rebuild the control-flow graph, lifting the result into LLVM IR, and then letting the standard compiler optimization passes delete the obfuscation’s dead arithmetic. They reported partially devirtualizing Tigress challenge functions in under a second per function, with honest limits: a single execution path, pure functions only, and trouble with some loops. That is the current state of the art and also a fair statement of where it still hurts.
Once you can name the opcodes, step three is to write the disassembler: a parser that walks the byte stream, and for each opcode consumes the right number of operand bytes and prints a readable mnemonic. This is the step that fails if you got an operand width wrong, because one bad handler desynchronizes everything after it. Step four is the lift and the cleanup, turning the linear disassembly into something with real control flow and folding away the redundancy the virtualizer added. Rolles’ FinSpy devirtualization is the clean illustration of why the cleanup matters. He found a whole class of the VM’s instructions existed only because, in his read, the VM’s developers had been lazy, and he wrote successive pattern-matching passes that deleted over a hundred such instructions while preserving the semantics. The thing you recover at the end is not the original source. It is a semantically equivalent program, often uglier, that does the same work.
There is a lower-effort path that skips most of this, and it is worth knowing because it shapes how defenders think. You do not always need to understand the VM. Sometimes you only need its output. If the goal is the value the script computes rather than the logic that computes it, you can run the real interpreter in an instrumented environment and read the result off the VM state, treating the whole thing as a black box. That is faster, and it is exactly why anti-bot vendors do not stop at obfuscation. They add anti-instrumentation: checks for a debugger, for hooked functions, for the timing skew that running under a tracer introduces. The obfuscation protects the logic from being read; the anti-instrumentation protects the execution from being watched. The two are separate problems, and a VM that only solves the first leaks through the second. The workflow for VM-based protections specifically is covered in more depth in the deobfuscation field guide.
Where the technique wins, and what it costs
VM obfuscation is the strongest of the common JavaScript obfuscation techniques for one reason that survives all the engineering detail: it changes the unit of analysis. Every cheaper technique leaves you reading JavaScript that has been made annoying. Virtualization makes you reverse-engineer an instruction set first, and only then read a program written in it. That extra layer is not a constant-factor slowdown. It converts a code-reading task into a tooling-building task, and tooling is expensive to build and cheap to invalidate, which is the asymmetry the technique is selling.
The cost lands in two places, and they are real. The defender pays at runtime, because interpreting bytecode is slower than running native JavaScript, sometimes by an order of magnitude, which is why virtualizers tend to wrap only the few functions worth protecting rather than the whole bundle. And the defender pays in fragility of a different kind, because a VM bug is invisible until it corrupts a value and ships a broken token. The attacker pays in time, repeatedly, on a clock the defender controls through polymorphism. None of this makes the logic unrecoverable. The published devirtualizations are proof that a determined analyst gets there, on FinSpy, on Tigress, on a real anti-bot VM running on a major retailer. What it does is make recovery slow, per-build, and obsolete on a schedule, which against an adversary who needs the bypass to keep working is a different and harder problem than against one who only needs to read the code once.
The honest closing observation is that virtualization does not hide a secret. It rents you time. The interpreter is right there, the bytecode is right there, and a patient analyst with symbolic execution will recover the semantics of any given build. The defense is not that the logic is unknowable. It is that knowing it is perishable, and the defender refreshes the expiry date faster than the attacker can rebuild against it. Whoever automates their side of that loop more cheaply wins, and that is a contest about tooling and rotation cadence, not about cleverness in the obfuscation itself.
Sources & further reading
- Rolles, R. (2009), Unpacking Virtualization Obfuscators — USENIX WOOT paper that established the fetch-decode-execute, VPC, and handler-table model for reversing VM protections.
- Willbold, J. (2019), The Secret Guide To Virtualization Obfuscation In JavaScript — builds a JavaScript virtualizer end to end, including a concrete opcode-to-handler dispatch example.
- Blazytko, T. (2021), Writing Disassemblers for VM-based Obfuscators — symbolic-execution methodology for recovering opcode semantics from a VM, demonstrated against Tigress.
- umasi (2023), Devirtualizing Nike.com’s Bot Protection (Part 1) — a real anti-bot JavaScript VM dissected: ~60 handlers, register-style state, a 386k-character base-50 bytecode string.
- Rolles, R. (2018), FinSpy VM Unpacking Tutorial Part 3: Devirtualization — phased devirtualization of a real malware VM, with pattern passes that strip redundant instructions.
- Royer, J. / Thalium (2024), LLVM-powered devirtualization of virtualized binaries — taint analysis plus LLVM IR lifting and optimization to recover Tigress-protected functions automatically.
- An, S., Lee, S., Cho, E.-S. (2026), Static Detection of Core Structures in Tigress Virtualization-Based Obfuscation Using an LLVM Pass — uses successor-count heuristics to locate dispatcher and handler blocks statically.
- Sharif, M., Lanzi, A., Giffin, J., Lee, W., Automatic Reverse Engineering of Malware Emulators / Deobfuscation of Virtualization-Obfuscated Software — semantics-based dynamic approach to recovering logic from virtualized code.
- javascript-obfuscator (project, 2025), GitHub repository and option docs — open-source reference for the cheaper layers VM obfuscation sits on top of: string array, name mangling, control flow flattening.
- obfuscator.io (2025), Control Flow Flattening and VM Obfuscation — vendor documentation describing the switch-state-machine transform and the bytecode VM mode that composes with it.
Further reading
Inside the DataDome JS tag: what ddjskey and the client payload carry
A reference on DataDome's client-side JavaScript tag: the ddjskey site identifier, the signals the browser collector gathers and posts to api-js.datadome.co, and how the challenge and interstitial flow is wired.
·21 min readInside the Cloudflare challenge platform: anatomy of the cf-chl orchestration
A primary-source walk through the Cloudflare interstitial: the window._cf_chl_opt object, the /cdn-cgi/challenge-platform/h/ orchestration endpoints, the obfuscated client script, and how a cf_clearance pass is returned.
·18 min readKasada's KPSDK: the 128-bit token, the VM, and the obfuscated bytecode
Traces how Kasada's client SDK works: the x-kpsdk-ct and x-kpsdk-cd tokens, the obfuscated JavaScript VM that runs Kasada-specific bytecode, the proof-of-work it computes, and how the payload rotates per tenant.
·19 min read