Skip to content

The CDP addScriptToEvaluateOnNewDocument trap and how detectors find it

· 19 min read
Copyright: MIT
Wordmark reading addScriptToEvaluateOnNewDocument with an orange marker on the worldName parameter

Every stealth stack that drives Chromium has to solve the same bootstrapping problem. The evasion code needs to run before the page’s own JavaScript, so that by the time the site reads navigator.webdriver or counts navigator.plugins, the answer has already been rewritten. There is exactly one well-trodden way to do that over the DevTools Protocol, and it is Page.addScriptToEvaluateOnNewDocument. Puppeteer’s evaluateOnNewDocument, Playwright’s addInitScript, and every “stealth” wrapper built on either of them ultimately funnel down to this one CDP command. It is the hinge the whole approach turns on.

Which makes it a good place for a detector to look. The command is convenient precisely because it gives the injected script an early, privileged seat. That privilege has a shape, and the shape is observable. Where the script runs (the main world or a named isolated world), when it runs relative to the document and the site’s first inline script, and what it has to touch to do its job all leave residue that a few lines of defensive JavaScript can probe. This post is a single-signal deep dive into that residue. It is the companion to the broader CDP detection vector piece; where that one surveys the whole protocol surface, this one stays on one command and follows it down to the V8 binding layer.

The road map: first, what the command actually does and the two worlds it can target. Then the Blink-level reason those worlds are distinct, because the isolation that makes an isolated world safe is the same isolation that makes it detectable in a different way. Then the timing question, which is where a lot of the real signal lives. Then the concrete probes a detector runs from inside the page, and the standardized successor in WebDriver BiDi that inherits the same structural problem. The through-line is that injection is never free; the only question is which tell you pay with.

What the command does, and the worldName fork

Page.addScriptToEvaluateOnNewDocument takes a string of source and arranges for it to run “in every frame upon creation (before loading frame’s scripts),” to quote the protocol reference directly. You call it once, you get back a ScriptIdentifier, and from then on every new document, including every iframe, gets your script poured in at the top before the page’s own code executes. The pairing command is removeScriptToEvaluateOnNewDocument, which takes that identifier back off the list. There is also runImmediately, which if set fires the script against execution contexts that already exist rather than waiting for the next navigation.

The parameter that matters most for detection is worldName. Leave it empty and the script runs in the main world, the same JavaScript context the page’s own scripts use, sharing the same window, the same prototypes, the same global scope. Set it to a string and the protocol does something else entirely. To quote the reference: if worldName is specified, the command “creates an isolated world with the given name and evaluates given script in it. This world name will be used as the ExecutionContextDescription::name when the corresponding event is emitted.” The same isolated world can be created directly with Page.createIsolatedWorld, which takes a frameId and an optional name “which is reported in the Execution Context” and hands back an executionContextId.

So the single command has two modes that look similar from the automation author’s chair and are worlds apart from the detector’s. One mode drops your code into the room with the page’s code. The other builds a separate room that shares the furniture but not the air.

one command, two destinations addScriptTo... {source, worldName?} worldName empty worldName set main world shares window with page shares prototypes page can read your edits tell: residue on shared objects isolated world own context + prototypes shares C++ DOM only page can't read your edits tell: the named context itself *The worldName fork. An empty name puts your script in the page's own context; a name spins up a separate context that shares the DOM but not the JavaScript heap. Each choice trades one class of tell for another.*

Why the two worlds are genuinely different at the binding layer

The main-world versus isolated-world distinction is not a CDP invention. It predates automation by years and exists to make Chrome extension content scripts safe. Reading the Blink binding design clears up why the two modes behave so differently, and why neither is a clean hiding place.

In Blink, the runtime is organized around three things the design doc calls Isolate, Context, and World. An isolate is one V8 instance bound to one thread. A context is roughly one window, the unit that owns a global scope and its prototype chains; the doc puts it as “one window object corresponds to one context.” A world is the sandboxing concept layered on top. There are three kinds: the main world that runs ordinary page JavaScript, the isolated world that runs extension content scripts, and the worker world. The key sentence, from the V8 binding design document, is that “all worlds in one isolate share underlying C++ DOM objects, but each world has its own DOM wrappers.” One context is created for each pair of (frame, world), so a page with N frames and M worlds carries N times M contexts.

That one architectural fact drives everything downstream. The same underlying C++ HTMLDivElement is real and shared, but the main world and an isolated world each get their own JavaScript wrapper object for it, living in their own context with their own HTMLDivElement.prototype. The design doc spells out the consequence: “each world has its own context and thus has its own global variable scope and prototype chains.” The point of all this was a security guarantee, that “Chrome extensions doesn’t share any JavaScript objects while sharing the underlying C++ DOM objects.”

Map that back onto the two injection modes and the detection picture falls out cleanly. When the injected init script runs in the main world, anything it does to a shared JavaScript object is visible to the page, because the page reads the very same object. If the script redefines navigator.webdriver with Object.defineProperty, or wraps navigator.permissions.query in a function, or installs a Proxy, that modification sits on an object the site’s own code can inspect. The site does not need CDP access or any special permission. It just reads its own globals and looks for the fingerprints of having been edited.

When the script runs in an isolated world instead, those edits are invisible to the page, because the page’s navigator lives in a different context with a different prototype chain. That sounds like the win, and for hiding property patches it is. The catch is twofold. First, an isolated world cannot reach the page’s JavaScript objects either, so the very evasions that need to touch window-level state in the main world stop working from there. The Rebrowser write-ups make this concrete: code in an isolated context cannot see window.grecaptcha, which is exactly the kind of main-world object a real automation run frequently needs. Second, the isolated world is a new, named context, and creating one is itself an event on the wire and a describable object in the runtime. You have not removed the signal; you have moved it from “edited shared object” to “extra execution context that has a name.”

The timing problem nobody can fully escape

Set the world question aside and there is a second axis the detector cares about: when does the injected script run? The selling point of the command is “before loading frame’s scripts.” That early seat is the whole reason to use it. It is also a tell.

Real web pages do not have a script that runs strictly before everything else with no DOM, no other globals, and a guaranteed clean slate, except the page’s own first inline script, which the site author controls and can reason about. An automation init script inserts a phantom first actor. Most of the time that actor is invisible because it finishes its work and gets out of the way. But the work it does has to leave the world in a state the page then reads, and a few of those state changes are observable as having happened “impossibly early” or as having happened at all.

The cleanest example is the contradiction class. Patch navigator.webdriver to read false in the main world, and you have set up a value that should be impossible to set from the page after load, because the property is defined by the platform. A defensive script that re-derives the property descriptor, walks the prototype chain looking for an own-property override where there should be an inherited native getter, or simply checks whether the getter’s toString() reads as [native code], can catch the substitution. The init script ran early enough to win the race against the page, but the artifact it left behind outlives the race. Timing got the value in place; it did not make the value look native.

document lifetime, left to right init script patches navigator first inline page script runs detector reads descriptors the residue survives the timing race winning the race does not make the artifact look native *Running first wins the race against the page's code. It does not erase the descriptor-level evidence that a substitution happened, which the page reads at its leisure long after.*

There is a subtler timing tell when the injection happens through the wrong CDP sequence. Reaching the point where you can call addScriptToEvaluateOnNewDocument in some stacks involves enabling the Runtime domain, and Runtime.enable is its own well-documented leak. Enabling it pushes the browser to emit Runtime.executionContextCreated events for existing contexts and primes the Runtime.consoleAPICalled path, which a page can detect by handing the console a crafted object and watching whether a getter on it fires during serialization. That is the heart of the Runtime.enable detection that the dedicated post on it covers. It matters here because the naive way to inject an init script can drag the Runtime.enable tell along with it, and the modern patched stacks work hard to avoid touching Runtime.enable at all, or to flip it on and immediately off inside a window narrow enough that the page is unlikely to be looking. The injection and the domain-enable leak are separate signals that historically traveled together.

What the detector actually reads from inside the page

None of the probes a defensive script uses require privileged access. They run as ordinary page JavaScript, and they are looking for the footprints the two injection modes leave. Walk through them by what they target.

The first family targets patched property descriptors on shared objects, which only bites the main-world case. A native accessor like the webdriver getter on Navigator.prototype has a recognizable shape: it is an inherited accessor, its function’s toString() reports [native code], and Object.getOwnPropertyDescriptor on the instance returns nothing because the property lives up the prototype chain. When an init script replaces or shadows it, at least one of those facts changes. The override becomes an own property, or the getter is a JavaScript function whose source does not read as native, or the descriptor’s configurable and enumerable flags differ from the original. A detector that snapshots the descriptor shapes of a handful of sensitive properties and compares them against the known-native template will flag the ones that have been touched, without ever knowing how they were touched.

The second family targets the Proxy and defineProperty machinery that careful stealth code uses to make patches look native. Wrapping a getter in a Proxy to forward most operations while intercepting the value is more convincing than a flat reassignment, but it is not invisible. A Proxy leaks through edge cases in error stack traces and exception types, where the trap layer shows up in ways a plain object never would. The Castle.io write-up on the evolution of anti-detect frameworks names exactly this: subtle differences in error stack traces or exception types can give away a proxy even though there is no official API to ask “is this a Proxy?” So the detector provokes an error against the suspected object and reads the stack and the exception class for the fingerprint of an interposed trap.

The third family is the one that bites the isolated-world case, and it is about the named context itself. When worldName creates an isolated world, that world has a name, and the name surfaces in the runtime’s execution-context descriptors. A stack that picks an obvious or constant world name leaves a constant string for anyone able to enumerate contexts to match against. Even where the page cannot directly read the descriptor, the existence of a second execution context per frame is a structural fact: extra contexts get created on a schedule that ordinary page activity does not produce, and behavior that bridges from an isolated world back into the main world (to reach something like window.grecaptcha) tends to do it through window.postMessage, which the page can listen for. The Rebrowser bridge approach acknowledges this directly: a page can watch window.addEventListener('message', ...) for the message shapes the bridge uses. Message-passing is common enough that it is not damning on its own, but a specific payload pattern carrying an id and an eval request is a narrower target.

probes run as ordinary page JavaScript descriptor shape getOwnProperty- Descriptor fn.toString() native getter vs own-property override catches: main world proxy / trap leak error.stack exception type interposed trap shows in trace catches: main world named context world name string postMessage shape extra context per frame, bridge traffic catches: isolated world moving from main to isolated swaps the left two probes for the right one; it does not zero the signal. *Three probe families and the injection mode each one catches. Choosing an isolated world to dodge the descriptor and proxy probes hands the detector the named-context and bridge-traffic probes instead.*

A fourth detail ties the families together. The init script’s job, in the main-world case, is usually to hide that the runtime has been patched at all. But hiding requires touching, and touching is the thing being probed. The Castle.io and Scrapfly write-ups both land on the same point from different directions: the most reliable stealth probes do not ask “are you a bot” directly, they ask “has this object been edited,” because an edited object is the unavoidable byproduct of running evasion code in the page’s own world. The only way to fully sidestep that is to not edit shared objects from the page’s world, which pushes you toward the isolated world and its different set of tells, or toward patching the browser at the source so there is nothing to edit at runtime. That second path is the subject of the source-patch versus runtime-injection piece, and it exists precisely because runtime injection cannot win the descriptor argument.

How the stealth side responds, and why it is a treadmill

Knowing the probes, the patched-stack response is predictable in outline even if the details churn. Run the injection in an isolated world so the descriptor and proxy probes have nothing in the page’s world to find. Avoid Runtime.enable so the console-serialization tell never arms, or toggle it within a window too narrow to catch. Choose world names that are not constant strings. Bridge back to the main world only when an evasion genuinely needs a main-world object, and make the bridge traffic look like ordinary message-passing. The Patchright project goes further and sidesteps the CDP command altogether for some injections, intercepting the HTML response at the network layer and modifying the Content-Security-Policy header to slip the script in as page markup that deletes itself after running, so there is no addScriptToEvaluateOnNewDocument call on the wire to detect in the first place. Each of these is a real improvement against a specific probe.

None of them ends the game, because the probes are not bugs to be fixed. They are consequences of the architecture. The descriptor tell exists because main-world editing is visible by design, the same design that lets content scripts and page scripts share a DOM. The named-context tell exists because isolated worlds are real, named contexts by design, the same design that keeps extension code from leaking into the page. A defender who adds one more probe is not exploiting a defect; they are reading a structural fact the platform publishes. That is why this surface does not close the way a CVE closes. The same dynamic shows up in why stealth plugins lose and in the timing-based detection work: the detector gets to pick the question, and the automation side has to have an answer for every question at once, while the detector only needs one question without a clean answer.

The honest caveat: the exact probe set any given commercial anti-bot vendor ships is not public, and what runs in a live challenge is obfuscated and rotated. What is documented is the mechanism, the CDP command and its parameters, the Blink world model, the Runtime.enable side effects, and the published behavior of the open-source stealth and detection projects. The specific descriptor templates a vendor compares against, the precise list of properties it snapshots, and the thresholds it scores on are inferred from observed behavior and the open-source tooling, not read out of a vendor’s source. Where this post describes a probe, it describes a class of probe that the architecture makes possible and that open tooling demonstrates, not a confirmed line in any one vendor’s bundle.

The standardized successor inherits the same trap

The CDP command is Chromium-specific and explicitly unstable at tip-of-tree. The cross-browser successor is WebDriver BiDi’s script.addPreloadScript, which adds a script that runs before any other script when a new context loads, with an optional sandbox parameter that is the BiDi name for the same isolated-world idea. The proposal for sandboxed execution, opened on the BiDi tracker in November 2021, says outright that the motivation is to match what existing frameworks already do: Puppeteer creates a named sandbox whenever a frame is created, Playwright uses utility contexts, and both lean on the CDP createIsolatedWorld and addScriptToEvaluateOnNewDocument pair to get there. BiDi standardizes that pattern into a sandbox map keyed by Window objects, with a SandboxGlobalScope exposing self, window, and postMessage.

Standardizing the mechanism does not standardize away its tells. A preload script still runs early, so the timing tell stands. A sandboxed preload script still runs in a separate world that shares the DOM but not the JavaScript heap, so the named-context and bridge tells stand. A non-sandboxed preload script still edits shared objects in the page’s world, so the descriptor and proxy tells stand. The BiDi authors even flagged in the proposal that a typo in a sandbox name could silently spin up a fresh sandbox, which is the same “extra named context appeared” residue wearing standards-track clothes. Selenium and WebdriverIO already expose addPreloadScript, so the surface is moving from a Chromium-only protocol into a W3C one that Firefox implements too. The trap is portable now.

Closing read

The reason addScriptToEvaluateOnNewDocument keeps showing up in detection write-ups is not that it is poorly designed. It does exactly what it promises, early and reliably, in whichever world you ask for. The problem is that “early and reliably, in a chosen world” is a description with no neutral option. Run in the main world and your edits are legible to the page because that is what sharing a world means. Run in an isolated world and you have created a named, separate context because that is what isolation means. There is no third world that is both invisible to the page and free of structural residue, because the world model was built to enforce exactly the visibility properties the detector reads.

That is the shape worth holding onto. The injected init script is the one component every Chromium stealth stack cannot do without, and it is the component whose every configuration leaves a different, specific tell. The arms race on this surface is not a contest of cleverness that one side eventually wins. It is a fixed menu of trade-offs where the automation side keeps reordering which tell it is willing to pay, and the detector keeps adding the probe for whichever one is currently cheapest. The menu has not gotten shorter in four years of patches, and standardization just printed it in a second language.


Sources & further reading

Further reading