Control-flow flattening and string encryption in obfuscated payloads

Open the JavaScript that ships with almost any commercial anti-bot product and the first thing you notice is that the code has no shape. There are no nested if blocks to follow, no readable string literals to grep for, no function whose name tells you what it does. Instead you get a while loop wrapped around a switch, a couple of arrays of base64 garbage near the top, and a numeric variable that gets reassigned at the end of every case. The logic is all there. It runs fine in the browser. But the structure that a human reader relies on to understand it has been deliberately destroyed.

Two transforms do most of that work. Control-flow flattening collapses a function’s branch structure into a flat state machine driven by a dispatcher, so the order in which blocks run is decided by a runtime variable rather than by the layout of the code. String encryption pulls every literal into a separate array, rotates that array by some offset, and encodes each entry so that even the strings give nothing away until the program decodes them at runtime. Neither is novel and neither is unbreakable. Both are everywhere, because they raise the cost of static analysis far more than they raise the cost of running the code.

This post is a technique reference for both. It walks through how flattening builds its dispatcher and how the state variable threads execution through it, then through how a string array is rotated and encrypted and self-defended, and finally through how deobfuscators reverse each one: the AST rewriters that handle the JavaScript-obfuscator family, and the symbolic-execution and microcode tools that handle the compiled OLLVM lineage the same ideas came from. The companion posts on deobfuscating anti-bot JavaScript and JavaScript VM obfuscation cover the surrounding pipeline; this one stays on these two transforms.

What flattening actually does to a function

Start with an ordinary function. It has branches, loops, early returns. Drawn as a control-flow graph, each straight-line run of instructions is a basic block, and the edges between blocks encode the branch logic: this block falls through to that one, this conditional jumps to one of two successors. A reverse engineer reads that graph the way you read a flowchart. The shape carries the meaning.

Flattening throws the shape away. Every basic block is lifted out of its position in the graph and dropped into a switch statement as a separate case. A single state variable, usually an integer, decides which case runs. The whole thing sits inside an infinite loop. Execution enters the loop, reads the state variable, jumps to the matching case, does that block’s work, and (this is the part that matters) sets the state variable to the number of whichever block should run next before looping back to the top. Tim Blazytko describes the core as “an endless loop that dispatches a state variable in a switch statement,” and that one sentence is the whole mechanism. The central block that does the dispatching is the dispatcher; the blocks holding the original logic are sometimes called original or relevant blocks.

*The branch structure on the left, where the order is visible in the edges, becomes a flat list of cases on the right where the order lives only in the state variable that loops back through the dispatcher.*

The effect on a reader is brutal. In the original, block A’s two outgoing edges tell you it is a conditional and B and C are its branches. In the flattened version every block points back at the dispatcher and the dispatcher points at every block, so the graph is a star with no information in its edges. To know that A is followed by either B or C you have to figure out what value A writes into the state variable, and that means reading A’s code, computing the constant, and matching it against the case labels. Do that for one block and you have one edge. The graph has dozens of blocks.

OLLVM, the Obfuscator-LLVM project that popularized this transform for compiled code, implements exactly this at the LLVM IR level. Quarkslab’s 2014 write-up of an OLLVM-protected binary describes the prologue assigning “a numeric constant which indicates to the main dispatcher (and to sub-dispatchers) the path to take to reach the target relevant basic block,” and notes that after each relevant block runs, “the state variable is affected with another numeric constant to indicate the next relevant block.” The same paper records a detail that recurs throughout flattening: the original conditional branches are turned into conditional moves, so that “according to the result of the comparison they will set the next relevant block in the state variable.” The branch does not jump. It picks a number. The dispatcher does the jumping.

That indirection through a number is the whole game. A direct jump is an edge in the graph; a number written into a variable is data, and data flow is harder to follow statically than control flow. Flattening is, at bottom, a conversion of control dependencies into data dependencies, betting that your tools reason about the former better than the latter.

Where the JavaScript version diverges

The JavaScript-obfuscator project (the engine behind obfuscator.io) ships flattening as controlFlowFlattening, off by default, with a controlFlowFlatteningThreshold of 0.75 that sets the probability any given node is transformed. The README is blunt about the cost: enabling it “greatly affects the performance up to 1.5x slower runtime speed.” That threshold matters because anti-bot vendors tune it. Flattening every function would tank the page; flattening a random three-quarters of them, picked fresh on each build, means the protected functions move around between deployments and a signature written against one build misses the next.

The JavaScript flavor is structurally lighter than OLLVM’s. There is no register allocation, no CMOV, no microcode. The dispatcher is a literal switch over an integer or, in some configurations, an index into a control string that is split and walked one token at a time. But the principle is identical: a state value selects the next block, and the order that used to live in the source now lives in a sequence of assignments to that value. The deobfuscation strategy differs sharply between the two worlds, which is the second half of this post, but the thing being undone is the same.

Detecting that a function has been flattened

Before you can unflatten anything you have to find the flattened functions, and at scale you cannot do that by eye. The standard heuristic comes from graph theory and it is clean enough to state in one line. A flattened function has a single block, the dispatcher, that dominates almost every other block, with a back edge returning to it from the blocks it dominates. Domination here is the compiler-textbook sense: block X dominates block Y if every path from the function entry to Y goes through X. In a flattened function every real block runs only by going through the dispatcher, so the dispatcher dominates all of them.

Blazytko’s detection metric is the ratio of ”# basic blocks dominated by x / # basic blocks in the function,” computed for each block x, looking specifically for a block that “controls a loop while dominating large function portions.” When that ratio crosses about 0.9, the function is almost certainly flattened. A normal function does not have one block dominating ninety percent of the others; its dominance is spread across the branch structure. A flattened one funnels everything through the dispatcher, and the dominance ratio spikes.

*A normal function spreads dominance across its branch structure; a flattened one funnels everything through one dispatcher block, and that block's dominance ratio crosses the detection line.*

This is also why flattening is not free for the defender, and not only in CPU. The very property that makes it work, one block dominating everything, is the property that makes it detectable. A scanner can flag flattened functions cheaply and reliably, which is part of why serious obfuscators layer flattening with opaque predicates, dead code, and the string encryption discussed below, rather than relying on it alone. The same arms-race logic that drives anti-bot integrity checks drives this: a transform that is easy to detect is a transform whose defenders assume you will detect it and design around that.

Hardening the dispatcher

A plain switch-over-an-integer-constant falls quickly, because the constants are right there in the code. Read each block, find the assignment to the state variable, note the constant, and you have rebuilt the edge. So the second generation of flattening hides the constants.

The first move is to encrypt or compute the case labels rather than store them plainly. Instead of state = 0x1A2B, a block computes the next state from the current one through some reversible arithmetic, so the next-state value is never a literal you can read off. eShard’s account of OLLVM’s evolution describes exactly this drift: where the state variable was once “a 32 bit constant with high entropy,” contemporary versions wrap it in “constant obfuscation or opaque predicate,” forcing a deobfuscator to “compute it dynamically” instead of pattern-matching a constant out of the instruction stream. The next state stops being data you can read and becomes the output of a computation you have to run.

The second move is the opaque predicate: a conditional whose outcome is fixed at obfuscation time but looks data-dependent to a static analyzer. if ((x*x + x) % 2 == 0) is always true for integer x, but a tool that does not reason about parity sees a real branch and dutifully explores both sides, one of which is dead code that exists only to waste the analyst’s time and inflate the graph. Opaque predicates do not change what the program does. They change how much junk a static tool has to wade through to find what the program does.

The third move is to split one dispatcher into several, or to nest them. Sub-dispatchers, mentioned in the Quarkslab analysis, mean no single block dominates everything, which directly attacks the detection heuristic above. Push it far enough and you are no longer flattening so much as building a small interpreter, which is the boundary where flattening shades into full VM-based obfuscation: a custom bytecode and a dispatch loop that executes it. The transforms sit on a continuum. Flattening is the cheap end; a bytecode VM is the expensive end; the hardened dispatcher with encrypted states and sub-dispatchers is the middle.

String encryption and the rotated array

The other half of a typical payload hides the data. Even with the control flow scrambled, readable string literals leak intent: a URL, a property name like webdriver, an error message, the name of a global the script probes for. Anti-bot scripts are full of property names they read off navigator and window to fingerprint the JavaScript runtime, and those names are exactly what an analyst greps for first. String encryption exists to make that grep return nothing.

The mechanism in the JavaScript-obfuscator family has three layers, and they stack. First, every string literal is pulled out of the code and into a single array, and each original use site is replaced by a call into a decoder function that takes an index. navigator.webdriver becomes something like _0x3f2a(0x1d)[_0x3f2a(0x2e)], where _0x3f2a is the decoder and the strings live in the array it reads from. The README calls this stringArray, on by default. Grepping for webdriver now finds nothing, because the literal webdriver is not in the source anymore; it is entry 0x1d of an array, encoded.

Second, the array is shuffled and rotated. stringArrayShuffle permutes the entries at build time so their order carries no information, and stringArrayRotate performs a rotation “at random” whose purpose, per the docs, is to make “it harder to match obfuscated strings with their original values.” The rotation is the clever part: the array stored in the source is the real array rotated by some offset, and a small bootstrap routine rotates it back into place before any decoder call runs. That bootstrap is usually an immediately-invoked function that spins the array, recomputes a checksum on each rotation, and stops only when the checksum matches a hardcoded target. The deobfuscation walkthroughs show the checksum as a chain of parseInt calls over decoded entries, divided and summed, compared against a constant. In one worked example, the loop rotates until the sum equals 0x4bc84. Until that offset is found, every index into the array returns the wrong string, so you cannot decode anything by reading entry N; you have to run the rotation first.

*The array in the source is rotated and encoded; a bootstrap loop spins it until a checksum matches, and only then does the decoder return real strings on access.*

Third, each entry is encoded, optionally encrypted. stringArrayEncoding accepts base64 or rc4. Base64 is just obscurement; it stops a naive grep but decodes trivially. RC4 is a real stream cipher with a per-array key, and the docs note it is “30-50% slower than base64, but harder to get initial values” because you cannot eyeball an RC4 ciphertext the way you can squint at base64. The decoder holds the key and runs RC4 on the indexed entry every time it is called, which is why heavily string-encrypted code is slow: every property access that used to be a literal is now a cipher operation.

There is a fourth layer that ties the data transform to tamper resistance. stringArraySelfDefending hashes the array and derives the rotation offset from that hash, so the amount the bootstrap has to rotate is a function of the array’s exact contents. The pull request that added it states the design directly: use the “output from hash function to get stringArray rotate value.” The consequence is that editing any string in the array, say to neutralize a check during analysis, changes the hash, which changes the required rotation, which leaves the whole array misaligned and the program broken. This is the same family of trick as the broader selfDefending option whose README description is simply “breaks if beautified”: reformatting the code, or touching the data, perturbs an invariant the code checks against itself, and it stops working. The point is not that you cannot edit it. The point is that editing it naively breaks it, so you have to understand the self-check first, which costs time.

Reversing the JavaScript transforms

The good news for the analyst is that everything the bootstrap does, the analyst can do too. The rotation routine, the decoder, the RC4 key, all of it ships in the file, because the browser has to run it. There is no server round trip to decode a string. That makes string encryption, for all its layers, fundamentally a local problem: extract the decoder, give it the array, and ask it for entry N. The strings come back in plaintext.

The clean way to do that is partial evaluation over the abstract syntax tree rather than running the whole script. Parse the file with a JavaScript parser into an AST, locate the string-array definition and the decoder and the rotation IIFE, evaluate just those nodes in an isolated context, and then walk the rest of the tree replacing every _0x3f2a(0x1d) call with the constant string the now-initialized decoder returns. The AST-rewriting tutorials in the Babel ecosystem do exactly this: isolate the decoder logic, evaluate it, then traverse and substitute. You never run the fingerprinting code or the network calls. You run only the decoder, on the values you choose, in a sandbox. The companion post on deobfuscating anti-bot JavaScript covers the sandboxing discipline that keeps that safe.

Tools have productized this. Webcrack, built on @babel/parser, @babel/traverse, @babel/types, and @babel/generator, targets the obfuscator.io output specifically and handles string array rotation, shuffle, and index shifting as named transforms. It is rule-based: it matches the known shapes the obfuscator emits and rewrites them. That is also its weakness. Rule-based deobfuscation is brittle by construction, and the documentation acknowledges that a small change such as rewriting while (!![]) to while (!false) can defeat a rule that was looking for the first form. The obfuscator and the deobfuscator are matching patterns at each other, and whoever changed their pattern most recently wins until the other catches up.

That brittleness is the reason the most recent work moves the pattern-recognition step to a model. Google’s CASCADE, described in a 2025 paper and reported as already deployed in production, uses Gemini to detect the “prelude functions — the foundational components underlying the most prevalent obfuscation techniques,” then hands the actual rewriting to a deterministic compiler IR (JSIR) that does constant propagation and inlining. The split is deliberate: the LLM finds the decoder and the dispatcher prelude, where flexible pattern recognition beats hand-written rules, but it does not generate the deobfuscated code, because a model that hallucinates a string value is worse than useless. The compiler does the transformation, so the output is correct by construction. The reported numbers are a 99.56% prelude-detection rate on a 12,000-file synthetic set and 98.93% string recovery across 11,610 samples, recovering on average 945 string literals per file in about two seconds. Whatever you think of LLMs, that division of labor, model for recognition and compiler for transformation, is the right shape, and it is the direction string deobfuscation is heading.

Flattening in JavaScript is undone by similar AST work, with one extra step. Once the strings are decoded, you can read each case’s body, find the assignment to the state variable, and recover the next-state value. When the next state is a plain constant, this is direct: build a map from current state to next state and reconstruct the original edges, then re-nest the blocks into proper if/while structures. When the next state is computed, the hardened variant, you fall back to the same technique the compiled-code world uses, which is to actually execute the state arithmetic and read off the result.

Reversing the compiled lineage: symbolic execution and microcode

The compiled side of flattening, where OLLVM lives, predates the JavaScript port by years and its deobfuscation tooling is more mature, because there is no decoder sitting in the file to borrow. You cannot just call a function and get the answer. You have to reason about what the code computes.

The dominant static approach is symbolic execution. Rather than running a block on concrete inputs, you run it on symbols, building up an expression for what the state variable will hold after the block executes. If a block unconditionally sets the next state, the symbolic expression collapses to a constant and you have recovered the edge. If a block sets the next state conditionally, the CMOV pattern where the original branch was turned into a conditional move, the symbolic engine explores both assignments, recovering both successors and the condition that chooses between them. The Quarkslab analysis built precisely this on the Miasm framework, executing blocks symbolically, maintaining a stack of branches to explore, and rebuilding the original graph by tracking which block leads to which; on its test function it recovered “the 3 conditions and the 4 equations” of the original logic. The general-purpose engine angr handles the same class of problem, and questions about full-path symbolic execution over CFF-protected functions are a recurring thread in its issue tracker.

*Symbolic execution computes each block's next-state expression; a constant gives one edge, a conditional gives both successors plus the condition, and the original branch graph falls out of the collected expressions.*

The other mature approach works on the decompiler’s intermediate representation rather than the raw instructions. eShard’s D810, an IDA Pro plugin, operates on Hex-Rays microcode and unflattens by tracking the state variable backward through the IR. Its MopTracker does recursive backward analysis across predecessor blocks to find where the state variable is defined, a microcode interpreter evaluates those instructions to compute the concrete next-state value, and a control-flow patch rewrites the microcode jumps to point at the real successors, deleting the dispatcher indirection. When a single block can produce more than one state value depending on a condition, D810 duplicates the block so each copy carries a single constant state, which untangles the conditional case into clean edges. eybisi’s analysis of the Approov RASP product using D810 frames the same idea around “dispatcher fathers” (blocks that feed the dispatcher) and the rule that “if we know the state variable in dispatcher father, we can calculate next dispatched block.” Once you know the state going in, you know the block coming out, and you can wire them together directly.

Both approaches share a vulnerability, and it is the one the hardened dispatcher exploits. They assume you can determine the state value statically. The moment the next state depends on real program input rather than on a fixed constant or a solvable predicate, pure static analysis stalls, because the value genuinely is not known until runtime. This is the documented weak point: static CFF deobfuscators have low success rates against compiler-optimized binaries where the data flow they depend on has been scrambled, and that is why the field keeps reaching for dynamic information: concrete execution traces, abstract interpretation that soundly covers all paths, data-flow-aware passes like the recent FlowSight and CaDeCFF work that specifically attack the state variable’s data flow. Each new deobfuscation technique closes the gap on a class of dispatcher, and each new obfuscation variant opens it somewhere else.

What the two transforms have in common

Step back and flattening and string encryption are the same idea pointed at two different things. Flattening takes control flow, which a static tool reads directly off the graph, and reroutes it through a data value, the state variable, so that recovering the graph means recovering the values. String encryption takes data, which a static tool reads directly off the literals, and reroutes it through a computation, the decoder and the rotation, so that recovering the strings means running the computation. Both convert something legible into something you have to evaluate to read. Both bet that evaluation is more expensive for the analyst than for the browser.

That bet has an expiry date, and it is shorter than vendors would like. Everything needed to undo both transforms ships in the payload, because the client has to run it. The decoder is in the file. The rotation offset is computable from the file. The state arithmetic is in the blocks. Nothing about either transform is cryptographically hard in the way a server-side secret is hard; the difficulty is entirely in the labor of extracting and re-running the machinery, and labor is exactly what tooling automates. Webcrack automated the obfuscator.io string array. Miasm and D810 and angr automated the dispatcher. CASCADE put a model on the recognition step and a compiler on the rewrite, and reported two-second turnarounds at 99% recovery. The cost these transforms impose is real, but it is a tax on the first analyst, not a wall against all of them, and once someone writes the un-transform the tax drops to near zero for everyone who follows.

Which is why neither transform appears alone in anything serious. They are floor, not ceiling. A current anti-bot payload layers flattening and string encryption with opaque predicates, anti-debugging that detects when it is being traced through the DevTools protocol, self-defending checks that break under reformatting, and frequently a bytecode VM on top of all of it. Each layer is individually beatable and collectively expensive, and the strategy is not to be unbreakable but to make the break cost more than it is worth on the timescale before the next build ships and resets the clock. Flattening and string encryption are the two transforms you will see in every one of those payloads, the baseline that everything else is stacked on, and understanding them precisely is the price of admission to reading the rest.

Sources & further reading

Tim Blazytko (2021), Automated Detection of Control-Flow Flattening — the dominance-ratio heuristic for flagging flattened functions, with the dispatcher and state-variable model stated cleanly.
Quarkslab / Francis Gabriel (2014), Deobfuscation: recovering an OLLVM-protected program — the canonical walkthrough of OLLVM flattening and symbolic-execution recovery with Miasm.
eShard / Boris Batteux (2021), D810: a journey into control flow unflattening — microcode-level unflattening in IDA, and how modern OLLVM obscures the state variable.
javascript-obfuscator (2024), Options reference (README) — the authoritative option list for stringArray, rotation, shuffle, encoding, self-defending, and control-flow flattening.
javascript-obfuscator PR #332 (2019), Self Defending with String Array — how the rotation offset is derived from a hash of the array to bind it against editing.
William Khem Marquez (2022), Deobfuscating JavaScript via AST: an introduction to Babel — the string-array decoder, the rotation checksum loop, and partial-evaluation recovery.
j4k0xb (2024), webcrack — rule-based AST deobfuscator for obfuscator.io output, handling string array rotation, shuffle, and index shifting.
CASCADE authors, Google (2025), CASCADE: LLM-powered JavaScript Deobfuscator at Google — production hybrid deobfuscator splitting prelude detection (Gemini) from transformation (JSIR), with recovery-rate numbers.
Zerotistic (2024), Breaking Control Flow Flattening: a deep technical analysis — building a Binary Ninja CFF remover with state-variable scoring and Z3 verification.
eybisi (2022), Control Flow Unflattening — applying D810 to a real RASP product, with the dispatcher-father model spelled out.