The history of Puppeteer and the headless-Chrome era it launched
For most of a decade, if you wanted to drive a browser from a script without a screen attached, you reached for PhantomJS. It was a headless WebKit build with its own quirks, its own stale rendering engine, and a single overworked maintainer. Then in April 2017 Google shipped headless mode inside Chrome itself, the PhantomJS maintainer announced he was stepping down within days, and a few months later a Node library called Puppeteer arrived to drive the new headless Chrome over a protocol that had been sitting inside DevTools the whole time. The question worth asking is not why Puppeteer succeeded. The question is why a wire protocol designed to power a debugging panel ended up becoming the default way a generation of engineers automated, tested, and scraped the web, and why that same protocol became the single loudest tell that gives those scripts away.
This is a history of one library and the foundation it stood on. It runs from the headless announcement and the death of PhantomJS, through the Chrome DevTools Protocol that Puppeteer wrapped, the puppeteer-extra and stealth plugins that grew up around it, the departure of the original team to build Playwright at Microsoft, and the shift to a new headless implementation. It ends where the story is still live: CDP as a detection vector, and what that means for anyone running headless Chrome at scale in 2026.
2017: headless ships, and PhantomJS dies the same week
On 27 April 2017, Eric Bidelman published a walkthrough on the Chrome developer blog titled “Getting Started with Headless Chrome.” Headless mode was shipping in Chrome 59 on Mac and Linux, with Windows following in Chrome 60. The pitch was simple. You could now run the full Blink rendering engine, with every modern web-platform feature Chrome supported, from the command line, with no visible UI shell and no X server required.
The first examples were command-line flags, not a library. Launch Chrome with --headless --disable-gpu and you could add --dump-dom to print the rendered DOM, --screenshot to capture a PNG, --print-to-pdf to render a PDF, or --remote-debugging-port=9222 to open a debugging endpoint you could talk to over a socket. That last flag is the one that mattered. The screenshot and PDF flags were conveniences. The remote debugging port exposed the Chrome DevTools Protocol, and that is what every serious automation tool would attach to.
The timing for PhantomJS was brutal. PhantomJS was a scriptable headless WebKit browser that had carried headless testing and scraping since 2011. It rendered with an aging fork of WebKit, it lagged behind real browsers on web-platform features, and by 2017 it was effectively one maintainer’s burden. Days before the headless-Chrome posts went out, that maintainer announced he was stepping down. His reasoning was blunt: Chrome headless was faster and more stable, it did not leak memory the way PhantomJS did, and as a solo developer he could not keep supporting all three platforms against a browser engine that real teams were paid to maintain. The original PhantomJS author archived the project in early 2018. A tool that had defined an entire category was obsolete the moment the category’s biggest player decided to compete.
*The headless announcement and the PhantomJS maintainer's resignation landed in the same month. Puppeteer's public release followed that summer; v1.0 shipped the next January.*2017: Puppeteer arrives as the official wrapper
Headless Chrome gave you a debugging port. It did not give you an ergonomic way to use it. Talking to the DevTools Protocol directly means opening a WebSocket, sending JSON commands keyed by domain and method, correlating responses by id, and tracking events that fire asynchronously. Workable, but nobody wanted to write that by hand for every screenshot.
Several third-party wrappers appeared almost immediately. Chromeless, Chrominator, and Chromy all raced to be the PhantomJS replacement. Then in August 2017 the Chrome DevTools team published Puppeteer, a Node library giving a high-level API over headless (or full) Chrome via the protocol. The motivation, as the team put it, was that dealing with the raw DevTools Protocol is not ideal for a developer writing an automation script. Puppeteer hid the WebSocket and the JSON and handed back a promise-based API where you navigated, clicked, typed, waited for selectors, and grabbed screenshots in a few lines.
Being the official wrapper mattered. Puppeteer was maintained by the same team that owned the protocol it spoke, which meant it stayed in lockstep with Chrome. It downloaded a known-good Chromium build on install, so the library version and the browser version were pinned together and tested together. When the protocol gained a capability, Puppeteer could expose it the same release. That tight coupling is exactly what PhantomJS, riding a frozen WebKit fork, could never offer.
Version 1.0.0 landed on 12 January 2018, bundling Chromium 65. It added code-coverage collection for JavaScript and CSS, PDF header and footer templates, XPath selectors, and target.createCDPSession(), an escape hatch giving direct access to the raw protocol connection for anything the high-level API did not yet cover. That escape hatch is a small detail with a large meaning: even the official wrapper acknowledged that the protocol underneath was the real surface, and gave you a door straight to it.
How the Chrome DevTools Protocol actually works
To understand both Puppeteer’s reach and its weakness, you have to look at the layer it sits on. The Chrome DevTools Protocol, universally abbreviated CDP, is the same interface the Chrome DevTools front-end uses to instrument, inspect, and control the browser. Its roots go back to the WebKit remote debugging protocol around 2012, with its domains-and-events shape. When Blink forked from WebKit in 2013, the protocol consolidated on the Chromium side and became CDP. The 1.3 protocol was tagged stable around Chrome 64.
The model is a set of domains, each grouping related commands and events. Page controls navigation and lifecycle. Runtime evaluates JavaScript and reports console activity. Network exposes requests and responses and lets you intercept them. DOM, Input, Emulation, Target, and a few dozen others cover the rest. A client connects over a WebSocket to the remote debugging endpoint, sends a command as a JSON object with an id, a method like Page.navigate, and a params object, and receives a matching response. Events for an enabled domain arrive as unsolicited messages. Puppeteer is, underneath all its ergonomics, a state machine that issues these commands and reacts to these events.
This design is why CDP could power so much beyond DevTools. It is also why it leaks. The protocol was built to drive a debugger that the user explicitly opens, so it was never designed to be invisible. Enabling a domain changes how the browser behaves in small, observable ways. Hold that thought, because it is the thread that runs through the rest of this story. For the protocol’s full reach, our piece on the Chrome DevTools Protocol as a detection vector goes deeper on the surface it exposes.
2018 onward: the detection problem nobody planned for
Puppeteer made headless Chrome trivial to drive. That convenience cut both ways. The same year it matured, sites that did not want automated traffic started looking for the seams, and headless Chrome had plenty.
The most famous tell was navigator.webdriver, a boolean the W3C WebDriver spec defines to advertise that a browser is under automation control. Headless Chrome under Puppeteer set it. Reading one property told a site that a script was driving the page. Beyond that flag, early headless Chrome differed from a real browser in a long list of measurable ways. The user-agent string contained the token HeadlessChrome. The navigator.plugins array was empty where a real desktop Chrome reported entries. navigator.languages could be missing or malformed. The window.chrome object that a normal Chrome populates was absent. WebGL reported a software renderer or a vendor string that did not match real hardware. Permissions queries returned inconsistent states. None of these were bugs exactly. They were honest reflections of a browser running without a display, without a user profile, and without the plugins and hardware a person’s machine carries. Our catalog of headless Chrome detection tells walks the full list.
For testing, none of this mattered. You were driving your own site on your own CI. For scraping a site that fought back, every one of these became a reason to get blocked. So a cottage industry grew to paper over the tells.
2018: puppeteer-extra and the stealth plugin
The answer the community converged on was a plugin framework. puppeteer-extra, by the developer who goes by berstend, wrapped Puppeteer with a plugin system, and its best-known plugin was puppeteer-extra-plugin-stealth. The stealth plugin bundled a set of independent evasion modules, each patching one specific tell before site scripts could run.
The module list reads like a field guide to headless detection. One evasion redefines navigator.webdriver so it returns undefined the way a normal browser does. Another mocks the chrome.runtime object and the broader window.chrome surface. Another populates navigator.plugins and the matching mimetypes so the array is not empty. There are evasions for the WebGL vendor and renderer strings, for navigator.vendor, for navigator.languages and the Accept-Language header, for window.outerWidth and window.outerHeight, for media-codec support that headless Chromium does not advertise, and for proxying iframe content windows so they do not give the game away. Each one targets a single difference between headless and real.
The plugin worked well enough to pass many public detection tests where vanilla Puppeteer failed outright. It also seeded a structural problem that defined the next several years. Patching a property in JavaScript is itself observable. The classic example: in a real Chrome, navigator.webdriver is simply not defined, so the property descriptor does not exist on the prototype at all. A naive patch that sets the getter to return undefined leaves the descriptor present even though the value is undefined, and the presence of that descriptor is a tell that the page has been patched. The fix for one signal becomes a new signal one layer down. We pulled apart the patch-by-patch mechanics in how the stealth plugin works, and the structural reason it loses ground in why stealth plugins lose.
This is the shape of the whole arms race. A detector reads a property. A stealth patch overrides it. The detector checks whether the property looks patched. A deeper patch hides the patch. Every round adds a layer, and the defender only has to find one layer the attacker missed.
2020: the team leaves, and Playwright forks the idea
By 2020 the people who had built Puppeteer inside Google had moved to Microsoft, and in January they shipped Playwright. The announcement was unusually candid. The Playwright team said plainly that they were the same team that originally built Puppeteer at Google but had since moved on, and that Puppeteer had proven the appetite for a new generation of capable, reliable automation drivers.
Playwright was not a literal source fork of the Puppeteer repository so much as a fork of the design, rebuilt to fix what Puppeteer structurally could not. The headline fix was cross-browser support. Puppeteer drove Chrome and Chromium, full stop, because it was a thin layer over CDP and only Chrome spoke CDP. Playwright shipped with Chromium, Firefox, and WebKit, by patching those engines to expose a CDP-like protocol Playwright could drive. It also reworked the auto-waiting and async model that made Puppeteer scripts flaky under race conditions, and added browser contexts as cheap isolated sessions, which matters enormously when you are running many parallel jobs.
The split left two libraries with near-identical core concepts and a deliberately easy migration path between them. For testing teams, Playwright steadily pulled ahead on features and ergonomics. Puppeteer stayed the Chrome-native option, still maintained by the Chrome team, still pinned to Chrome releases. For anyone trying to look human, the choice mattered less than it seemed, because both sat on a CDP foundation and both inherited the detection surface that comes with it. The detection differences between the two, and Selenium, are spelled out in Playwright vs Puppeteer vs Selenium. The older WebDriver lineage that Selenium carries is its own story, told in the history of Selenium and WebDriver.
2023: new headless, and the old one becomes a separate binary
For its first six years, headless Chrome was not actually the same browser as the Chrome you ran with a window. The original headless was a lightweight wrapper around Chromium’s //content module, with far fewer dependencies, no X11 or Wayland, no D-Bus. That made it fast and easy to deploy on a bare server. It also made it subtly different from real Chrome in ways detectors could fingerprint, which is part of why headless was so detectable: it really was a different binary with different behavior.
In early 2023 Chrome 112 introduced a new headless mode behind --headless=new. This new mode is the real Chrome browser running without a visible window, not a separate lightweight build. It closed a swath of behavioral gaps between headless and headed Chrome by the simple expedient of being the same code. The cost is weight: new headless pulls in the full browser, so it needs more of the system around it.
The old implementation did not vanish immediately. It was carved out into a standalone binary called chrome-headless-shell, generated for every user-facing Chrome release and distributed through the Chrome for Testing infrastructure starting with Chrome 120, for workloads that wanted the old speed and did not need full-browser fidelity. Then Chrome 132 finished the transition: it dropped the old mode from the main Chrome binary entirely, so --headless and --headless=new both mean the new mode, and --headless=old prints an error. Puppeteer tracked the same path, defaulting to new headless from v22, with headless: 'shell' selecting the lightweight binary.
New headless mattered for detection precisely because it removed an entire class of differences. A detector that had been keying off old-headless quirks lost those signals when the binary became real Chrome. That did not end the game. It moved it to the layer underneath, the one Puppeteer was built on.
CDP itself becomes the signal
The deepest tell is not a missing plugin or a stale user-agent. It is the act of automation itself, visible through the protocol.
The clearest version of this was the Runtime.enable signal. CDP’s Runtime domain has to be enabled before a client can evaluate expressions and receive console events, and Puppeteer and Playwright both enable it as a matter of course. Once enabled, Chrome serializes objects passed to console methods so it can ship them across the WebSocket to the client. That serialization had a side effect. Building a preview of an error object would touch the object’s properties, including custom getters on things like .stack. A page could define an error with a booby-trapped getter, log it, and learn from whether the getter fired that the Runtime domain was live, meaning a CDP client was attached. The check was synchronous, needed no special permissions, and fired even when navigator.webdriver had been scrubbed and the user-agent spoofed. It did not care what the page looked like. It detected the controller.
This signal worked for years and was widely deployed by anti-bot vendors. Two V8 commits in May 2025 quietly ended it. One was titled “Avoid error side effects in DevTools” on 7 May, the other “Apply getter guard throughout error preview” on 9 May. Together they added a guard so DevTools previews no longer execute user-defined getters during error serialization. The getter never fires, so the detection flag never flips. A reliable signal that had survived every stealth patch was retired by a change inside the JavaScript engine, not by anything the automation tools did. The full mechanism, and why it stopped working, is in detecting CDP via the Runtime.enable leak.
That fix is not the end of CDP detection. It is one example of the pattern. The protocol was designed to drive a debugger that a developer deliberately opens, not to hide. Enabling a domain, attaching a session, intercepting a request, each can change behavior in ways a determined detector can observe. New headless made the browser binary indistinguishable from real Chrome, which pushed detectors toward exactly this layer: not what the browser is, but whether something is driving it through the debugging interface. Anti-detect tooling responded by moving off the high-level libraries entirely and driving Chrome through narrower CDP calls, or by patching Chromium from source so the protocol’s fingerprints never appear, an approach we compare in patching Chromium from source vs runtime injection.
What Puppeteer did to scraping
Step back from the cat and mouse and the broader effect is plain. Before headless Chrome, scraping a JavaScript-heavy site meant either reverse-engineering its XHR calls by hand or wrestling PhantomJS and its rendering gaps. Puppeteer made full-fidelity rendering a default. If a person could see it in Chrome, your script could render the same DOM, because your script was Chrome. That lowered the barrier to scraping single-page apps so far that “just run a headless browser” became the reflexive answer to any hard scraping problem.
It also set a cost that has only grown. A headless Chrome instance is heavy. It carries a real browser’s memory and CPU footprint per page, which is fine for a test suite and expensive for a crawler hitting millions of URLs. The economics of paying that browser tax, against the alternative of replaying the underlying API, are their own subject, covered in the headless-browser tax and parsing at scale: browser vs HTTP client. The short version: a browser is the most convenient tool and rarely the cheapest, and at scale the difference compounds.
And it created the detection surface that the entire anti-bot industry now mines. The same protocol that made automation easy made automation legible. Vendors learned to read the headless tells, then the stealth patches, then the protocol itself. The result is a market where running headless Chrome against a defended target is less a coding task than a continuous maintenance burden, because the signals shift on a cycle measured in browser releases. This sits inside the longer arc told in the history of web scraping and the history of the bot-mitigation industry.
Where it stands in 2026
Puppeteer is still maintained, still Chrome-native, still pinned to the browser it drives. Playwright won most of the testing market on cross-browser support and ergonomics, while both libraries share the same CDP lineage and the same fundamental tell: they drive Chrome through an interface that was never meant to be invisible. New headless erased the easy fingerprints that came from headless being a different binary, which was a real improvement for anyone running automation honestly and a real problem for detectors who had relied on them. The detection conversation moved down a layer to the protocol, where the Runtime.enable signal lived and died on a V8 commit nobody outside the niche noticed.
The detail worth holding onto is the one from the beginning. A debugging protocol built to power a panel that a developer opens by choice became the substrate for a decade of automation. Every property of that origin still shapes the field. The convenience that made Puppeteer the default came from the protocol’s reach. The detectability that makes headless Chrome a maintenance burden comes from the same place: an interface designed to instrument a browser is, by construction, an interface that announces a browser is being instrumented. You cannot have the first without the second, and eight years of stealth patches have not changed that arithmetic.
Sources & further reading
- Bidelman, E. (2017), Getting Started with Headless Chrome — the original Chrome-team walkthrough announcing headless in Chrome 59, with the command-line flags and the DevTools Protocol.
- Chromium team (2017), Chrome 59 Beta: Headless Chromium — the Chromium blog confirming headless shipping in the Chrome 59 beta.
- InfoQ (2017), Google’s Puppeteer Joins Crowd of Headless Chrome Tools — Puppeteer’s August 2017 public arrival and its position against PhantomJS replacements.
- InfoQ (2017), Phantom.js Maintainer Steps Down, Leaving Project’s Future in Doubt — the maintainer’s resignation, timed to headless Chrome.
- puppeteer (2018), Release v1.0.0 — the 12 January 2018 release notes, Chromium 65, and target.createCDPSession.
- Chrome DevTools (n.d.), Chrome DevTools Protocol — the canonical CDP reference, with the domains-and-events model and version history.
- InfoQ (2020), Microsoft Announces Playwright Alternative to Puppeteer — the January 2020 Playwright launch and the ex-Puppeteer team’s own account.
- Chrome for Developers (2024), Download old Headless Chrome as chrome-headless-shell — the split of the old headless implementation into a standalone binary.
- Chrome for Developers (2025), Removing —headless=old from Chrome — the Chrome 132 removal of the old headless mode and the migration guidance.
- Puppeteer (n.d.), Headless modes — the official guide to new headless versus chrome-headless-shell and the v22 default change.
- berstend (2018–), puppeteer-extra-plugin-stealth — the stealth plugin readme listing the per-tell evasion modules.
- Castle (2025), Why a classic CDP bot-detection signal suddenly stopped working — the Runtime.enable signal, the console-serialization mechanism, and the May 2025 V8 commits that ended it.
Further reading
The Chrome DevTools Protocol as a detection vector
Traces what attaching a CDP client changes inside a running Chrome: the websocket transport, the domains it switches on, the side effects that reach the page, and why a debugged browser is observably different from a hand-driven one.
·19 min readDetecting CDP in the wild: the Runtime.enable leak and the V8 patch war
The biography of one bot-detection signal: how logging an Error object exposed a CDP-driven browser, how DataDome made the trick public in 2024, and the two V8 commits that quietly broke it in May 2025.
·26 min readThe CDP addScriptToEvaluateOnNewDocument trap and how detectors find it
Traces the specific tells of Page.addScriptToEvaluateOnNewDocument: the isolated-world versus main-world choice, when the injected init script actually runs, and the residue an anti-bot script can probe from inside the page.
·19 min read