Skip to content

WebAssembly in anti-bot systems: moving detection logic out of readable JS

· 21 min read
Copyright: MIT
The .wasm magic bytes 00 61 73 6d over a dark background with an orange accent bar

Open the network tab on a site behind a serious anti-bot product and you will usually find the detection script in the clear: a few hundred kilobytes of mangled JavaScript, variable names crushed to single letters, strings encoded, the whole thing wrapped in an interpreter that runs its own bytecode. Ugly, but readable. Set enough breakpoints and you can watch it collect a canvas hash, walk navigator, time a few operations, and POST the result. The logic is hidden behind obfuscation, not behind a wall. Given a week and a debugger, a competent person gets through it.

Now load a site where part of that logic has been compiled to WebAssembly. The script still POSTs a payload, but the function that built it is no longer a JavaScript closure you can step through. It is a binary module the engine hands to a compiler, executed as machine code with a stack you cannot name and a linear memory that looks like a flat byte array. The breakpoint you would have set does not exist anymore. That is the point of moving the code there, and this post is about what that move actually buys a vendor, what it does not, and how hard the resulting binary is to read back into something a human understands.

The sections below start with what WebAssembly is and why its shape matters for hiding logic, then cover the two things vendors actually put inside a module (fingerprint collection and proof-of-work), the timing side channel that wasm opened up, and the reverse-engineering toolchain as it stands in 2026 with a clear-eyed look at where it falls short. A closing section weighs how much protection the move really delivers.

What WebAssembly is, and why its shape matters here

WebAssembly is a binary instruction format for a stack machine. Mozilla first announced it in 2015 as a successor to asm.js, four browser vendors (Mozilla, Microsoft, Google, Apple) converged on a minimum viable product in March 2017, and the 1.0 core specification became a W3C Recommendation on 5 December 2019, the fourth language to run natively in browsers after HTML, CSS, and JavaScript. By then every major engine shipped it: Chrome 57, Firefox 52, Safari 11, Edge 16. A module you compile today runs essentially everywhere a modern site loads.

The format is the thing that matters for this topic. A .wasm file is a sequence of typed sections. It opens with the eight magic bytes 00 61 73 6d 01 00 00 00 (the string \0asm plus a version of 1), then a type section declaring function signatures, an import section, a function section mapping functions to their types, sections for tables, memory, globals, exports, and finally a code section holding the actual instruction bodies. Computation runs on an implicit operand stack. There are no registers you name and no general-purpose memory addressing in the C sense; values live on the stack or in numbered locals, and the module’s data lives in a single contiguous linear memory exposed to JavaScript as an ArrayBuffer.

A wasm module, section by section 00 61 73 6d magic + version type signatures import JS funcs memory linear pages export named entry pts code section function bodies: stack ops, numbered locals, no names, no strings Names survive only in the optional name section, which production builds strip. *The binary carries function signatures and an export table, but the bodies are stack instructions over numbered locals. Identifiers and string literals do not survive compilation unless a build deliberately keeps them.*

That last detail is the whole reason this format is attractive to an anti-bot vendor. When you minify JavaScript, you lose readable names but keep the language: it is still JavaScript, with its objects, its property accesses, its string literals sitting right there in the source. A debugger understands every line. When you compile C or Rust or AssemblyScript to wasm, the source language is gone. Local variables become indices. Strings that are not needed at runtime evaporate. Control flow that read as if/for in the source becomes structured block/loop/br_if instructions that a decompiler has to reconstruct into something resembling the original. The engine never needs the names to run the code, so a release build does not ship them.

Why vendors moved logic into wasm

The honest version of the motivation is friction, not secrecy in the cryptographic sense. Nothing about wasm is encrypted. The bytes are right there in the response, and anyone can disassemble them to the textual .wat format with a single command. What wasm changes is the cost and the shape of the analysis work. A reverse engineer who is fluent in reading obfuscated JavaScript in a browser debugger has to switch to a different and less ergonomic toolchain, one where the browser’s own debugging support is thinner and the decompiled output is harder to follow. The skill transfers but slowly, and the per-target effort goes up.

DataDome put numbers around its own thinking when it shipped a layered obfuscation update covering its Device Check and Slider products, announced in February 2026. The company describes the protection as three layers working together: dynamic obfuscation (variable names, code structure, encryption keys, and the script’s internal layout all rotating on a schedule), WebAssembly compilation of detection-relevant logic, and a bytecode virtual machine whose opcodes and interpreter architecture also rotate. The stated aim is to raise the complexity and cost of reverse engineering for both humans and AI-assisted tooling, and to keep raising it by changing the target faster than anyone can finish studying a snapshot. DataDome also publishes that client-side signals make up roughly 30 to 40 percent of the signals feeding its detection, which sets a ceiling on how much any client obfuscation can matter: the server still holds the majority of the decision.

That number is worth sitting with, because it explains why wasm is a layer and not a strategy. A vendor that put its entire verdict in the browser would be betting everything on code the attacker fully controls and can run a million times offline. The browser code’s job is narrower: collect signals honestly and prove the collection happened in a real environment, then hand the verdict to a server the attacker cannot see. Wasm makes the collection harder to fake and harder to study, which buys time. It does not move the decision.

Where wasm sits in the stack browser (attacker-controlled) obfuscated JS loader + VM .wasm: collect + prove signed payload + token POST edge / server (hidden) scoring pipeline → verdict ~60-70% of signals decided here attacker never sees this code Wasm hardens the client collection. The verdict stays server-side, out of reach. *Per DataDome's own figures, client-side signals are roughly 30-40 percent of the inputs. Compiling collection into wasm raises the cost of faking or studying that slice; it does not move the verdict, which stays on a server the attacker cannot read.*

This is the same architecture you see across the major vendors regardless of whether they use wasm, and it is worth reading alongside the server-side vs client-side bot detection split. The client gathers, the server decides. Wasm is a way to harden the gathering. For more on how one vendor structures the server half, the DataDome scoring pipeline post covers what happens after the payload lands.

What goes inside the module: fingerprint collection

The first thing a vendor compiles into wasm is the part of fingerprint collection that benefits from being a black box. Much of fingerprinting has to stay in JavaScript because the signals live in JavaScript APIs: you cannot read navigator.plugins or run a canvas toDataURL from inside wasm without calling back out to JS. So the collection of raw values happens in script. What moves into the module is the processing: hashing the collected values, mixing them with secrets baked into the binary, deriving the integrity checks that prove the values were not tampered with, and assembling the final payload in a layout the server expects.

The exact field layout any given vendor uses is not public, and anyone who tells you they have the canonical schema is selling something. What follows is inferred from observed traffic, vendor documentation, and published research rather than from a leaked spec. The pattern that recurs is a split where JavaScript does the API reads and wasm does the cryptographic-style finishing. That split is deliberate. The API reads are easy to observe no matter what, because you can hook the APIs themselves. The finishing step is where the secret sauce lives, the part that says “this collection is genuine and came from our code,” and that is the part worth hiding in a binary.

There is solid academic ground for the claim that wasm makes a fingerprint not just harder to read but better. A 2025 study from Ben-Gurion University, Browser Fingerprinting Using WebAssembly, built a fingerprinting method entirely out of wasm execution timing and reported a 99.29 percent success rate distinguishing Chromium-based browsers from non-Chromium ones across 158 browser instances, with a single misclassified Firefox among 55 tested. The reason it works is the same reason vendors like wasm: JavaScript carries the overhead and unpredictability of JIT compilation, while wasm delivers near-native performance and consistent execution times. Consistency is exactly what you want when the measurement itself is the signal.

A second study from the University of Texas at San Antonio and Google, The WASM Cloak, published in 2025, looked at the defensive flip side and found something telling. The researchers built a pipeline that automatically converted JavaScript fingerprinting scripts to wasm, processing 7.5 million scripts and identifying 10,742 fingerprinting scripts across canvas, WebRTC, AudioContext, and canvas-font categories. When they ran the converted scripts past academic fingerprint detectors that work by analyzing source code, detection collapsed: one tool’s recall fell from 77.78 percent to 44.44 percent on its own dataset, and another stopped working entirely. Commercial defenses that intercept at the API level were unaffected, because they spoof the output and do not care whether the caller is JS or wasm. The lesson cuts both ways. Source-analysis tooling is blind to logic hidden in a binary, while runtime interception does not care what language asked the question.

What goes inside the module: proof of work

The other thing vendors put in a wasm module is a computational challenge the client has to solve before it gets a token. The idea predates the web by decades. Make every request cost a little CPU, and a human browsing pays nothing they notice while a bot fleet making millions of requests pays in aggregate. Hashcash formalized this in 1997, Bitcoin made the mechanism famous, and anti-bot vendors adopted a browser version: the server hands the client a challenge, the client burns some cycles, and the proof rides along with the next request.

The clearest public example of the pattern is Anubis, an open-source proof-of-work proxy that Xe Iaso released in January 2025 after AI scrapers overwhelmed a self-hosted Git server. Anubis is worth studying precisely because it is open, so you can read exactly what a commercial vendor keeps closed. The server builds a challenge by taking a SHA-256 sum over request metadata: the Accept-Encoding and Accept-Language headers, the client IP, the User-Agent, the current time rounded to the nearest week, and a checksum of the server’s ED25519 private key. The client then hashes that challenge with an incrementing nonce until the resulting digest has a configured number of leading zero bits, with a default difficulty of five. Find the nonce, send it back, get a cookie. The default implementation runs the search in a Web Worker in JavaScript, and the project has discussed a WebAssembly path for the hashing hot loop, since a tight SHA-256 inner loop is exactly the kind of code that runs faster and more predictably as wasm than as JIT-compiled script.

Browser proof-of-work, in outline server client (wasm hot loop) challenge = H(metadata) send challenge + difficulty for nonce in 0..: if H(challenge,nonce) has N leading zero bits -> done verify nonce -> issue token Verification is one hash; finding the nonce is 2^N work on average. The asymmetry is the point. *The server checks the answer in a single hash; the client averages 2^N hashes to find it at difficulty N. Anubis defaults to N=5. Putting the inner loop in wasm makes the client's cost predictable and a little cheaper, which matters when a real visitor is paying it on every navigation.*

Commercial vendors run a more elaborate version of this. Kasada’s client payload, served as a heavily obfuscated script, performs browser fingerprinting and a proof-of-work computation inside a custom interpreter, then emits short-lived signed tokens that ride in request headers under the x-kpsdk-* namespace. The public reporting describes a custom JavaScript virtual machine doing the heavy lifting rather than a standards wasm module, which is a related but distinct technique covered in the Kasada KPSDK writeup. The common thread between a JS VM and a wasm module is the same: force the work to happen inside an opaque execution context that resists being read, copied, or run in isolation, so that a token cannot be minted without paying the cost in something close to a real browser. The proof-of-work renaissance post covers the economics of why this resurfaced when AI crawlers made cheap mass requests a problem again.

Two things make proof-of-work in wasm more than a speed trick. First, the hashing loop can be entangled with the fingerprint, so the nonce you find is only valid alongside the specific environment values the module collected, which means a precomputed answer does not transfer to a different session. Second, the difficulty can scale with suspicion. A session that already looks bot-like gets handed a harder challenge, so the cost rises on exactly the traffic the vendor wants to discourage while a normal visitor never feels it. The wasm part is not the security boundary. The server’s verification is. Wasm just makes the client’s side of the bargain harder to shortcut.

The timing side channel wasm opened

Wasm did not only give vendors a place to hide logic. It gave them a new signal. Because wasm runs with consistent, near-native timing, you can use the wasm engine itself as a measuring instrument and read the device underneath the browser through it.

The Ben-Gurion work is the cleanest demonstration. Its fingerprint is built from three timing dimensions, all of them measured by running wasm and watching the clock: the latency of calling a JavaScript function from inside wasm, which exposes how a given engine bridges the two worlds; the latency of reading and writing wasm linear memory, which reflects the host’s memory management and cache behavior; and the time to run built-in math like Math.sin and Math.cos called out from wasm, which varies with each browser’s native implementation. None of these reads a stable identifier. They read the performance signature of the engine and the silicon, and that signature is consistent enough across runs to classify the browser family with better than 99 percent accuracy.

This is the same family of idea as JavaScript runtime fingerprinting and the broader hardware concurrency and device memory tells, except wasm makes the measurement sharper. The defensive headache is real. A timing discrepancy that looks suspicious in one context is perfectly legitimate in another, so building a detector that flags fake timing without drowning in false positives is genuinely hard. And the obvious countermeasure of reducing timer resolution only goes so far, because an attacker who wants high-resolution timing can amplify a small signal by repeating the measurement. The signal lives in the architecture, and you cannot make wasm slow and inconsistent without breaking the reason it exists.

Reading the binary back: the 2026 toolchain

Now the part a reverse engineer actually cares about. You have the module. How bad is it to understand?

The first step is always disassembly to text. The WebAssembly Binary Toolkit (WABT) is the standard kit here, and wasm2wat turns a .wasm file into the .wat textual format in one command. What you get is faithful and completely unambiguous, and also close to unreadable for anything non-trivial, because it is a literal transcription of stack operations. A few lines of source can expand into well over a hundred lines of .wat. You see every local.get, every i32.add, every br_if, with control flow expressed as block and loop and branch instructions rather than the loops and conditionals a human wrote. It is assembly, and reading it is reading assembly.

The next rung up is a decompiler that tries to recover structure. WABT ships wasm-decompile, which produces a C-like syntax that is easier on the eyes than raw .wat but still translates largely instruction by instruction, so the improvement in readability is modest. The other common path runs wasm2c to emit C, then feeds that C through a real optimizing compiler and a native reverse-engineering tool. Compile the wasm2c output with gcc -O3, load the result in Ghidra or IDA, and the optimizer collapses a lot of the stack-machine bloat into something closer to normal pseudocode. This works, and it is a common workflow, but it launders a wasm problem through a native-binary toolchain that was never built for wasm, and you inherit that toolchain’s own roughness.

From .wasm to something you can read .wasm binary .wat wasm2wat (WABT) C (wasm2c) + gcc -O3 Ghidra / IDA pseudocode What survives the trip: control flow and arithmetic, mostly. Lost: names, strings, types, intent. Recompilability stays high; semantic correctness falls as complexity rises. *The standard route is faithful but lossy. Disassembly to .wat is exact and unreadable; running the wasm2c output through an optimizing compiler and a native decompiler recovers structure but inherits a native toolchain's rough edges.*

How good is the result, measured rather than asserted? A 2024 study from USC and the University at Buffalo, presented at SecureComm, evaluated C-based decompilation of wasm against native-binary decompilation and found the recurring failure mode: the output recompiles but does not always mean the same thing. A separate Datalog-based decompiler reported that more than 97 percent of its decompiled programs were recompilable, yet only around 70 percent of the lowest-complexity programs were also semantically correct, and that correctness figure fell below 20 percent as complexity rose. That gap between “it compiles” and “it does what the original did” is the trap. A decompiler that produces plausible, recompilable C on a complex obfuscated module can be quietly wrong about the logic, which for someone trying to understand a detection routine is worse than no output at all.

The newest direction is learned decompilation. A 2024 paper, WaDec, fine-tuned a large language model to decompile wasm directly and reported genuinely large gains over the classic tools: a code inflation rate of 3.34 percent against the prior best of around 117 percent, recompilability above 50 percent where Ghidra sat under one percent, and an AST-edit-distance improvement of 185 percent. The output reads far more like source. It also inherits every LLM failure mode, most importantly that a confident, well-structured reconstruction can be subtly invented, which is the same correctness problem the static tools have, wearing nicer clothes. For anti-bot work, where you need to be sure what a branch actually checks, “looks right” is not the bar.

The browser’s own tooling deserves a mention because it cuts the other way. Chrome’s DevTools can debug wasm, set breakpoints in .wat, and step through instructions while inspecting the operand stack and linear memory. With DWARF debug info present you can even see source-level names, though no vendor ships that. Dynamic analysis in the live engine, watching what the module reads and what it returns, is often more productive against an obfuscated anti-bot module than any static decompiler, because you sidestep the question of recovering intent and just observe behavior. That is also why this connects to the deobfuscating anti-bot JavaScript workflow: against a black box, you instrument the boundary rather than read the contents.

How much protection the move actually delivers

Strip away the vendor language and wasm is a speed bump, a real one, but a speed bump. It does not encrypt anything. Every byte of the module ships to the client and can be disassembled to exact, unambiguous text with a single command that has existed for years. What it changes is the slope of the work. A JavaScript obfuscation that a skilled person reads in a browser debugger becomes a binary that pushes them onto a clumsier toolchain, where the decompiled output is harder to trust and the one tool that reads almost like source is a language model that might be confidently making things up. The skill to do it exists. The time it takes goes up. For a vendor, time is the product, because the client code rotates and a study of last month’s module is worth less every week it ages.

The deeper reason wasm holds up is not the format at all. It is that the vendors using it never bet the decision on the client. DataDome’s own published split puts client-side signals at 30 to 40 percent of the inputs, which means even a fully reverse-engineered module hands an attacker a minority stake in a verdict computed somewhere they cannot see. The module’s real job is to collect honestly and prove the collection was genuine, and proof-of-work and timing fingerprints both push that proof into territory where faking it costs about what doing it for real costs. You can read the wasm. Reading it does not tell you the server’s threshold, and it does not let you mint a token without paying.

What is shifting under all of this in 2026 is the analysis side, not the protection side. The static decompilers have been stuck at the same wall for years, faithful but unreadable at one end, readable but unreliable at the other, with semantic correctness falling off a cliff as obfuscation climbs. The thing that moved is learned tooling, and an LLM that turns a wasm module into clean pseudocode in seconds genuinely lowers the cost of a first read. It does not solve the correctness problem, which for a binary whose whole purpose is to resist understanding is the only problem that counts. So the arms race lands where it usually does: the vendors rotate their modules faster, the analysts get faster tools that are right most of the time, and the gap between “I can read this” and “I am sure what it does” is the space both sides are fighting over.


Sources & further reading

Further reading