Skip to content

The history of Selenium and WebDriver: from 2004 to the W3C standard

· 22 min read
Copyright: MIT
Selenium wordmark with an orange WebDriver arrow, on black

Almost every browser-automation tool in use today speaks a protocol that began as an internal testing hack written in JavaScript at a consultancy in Chicago in 2004. The HTTP commands your test framework sends to a browser, the /session endpoint, the element-handle abstraction, the verb names like “click” and “navigate to”. None of it was designed by a standards body and then implemented. They were reverse-engineered into a standard out of a working open-source tool that two rival projects agreed to merge. The standard came last.

That is the unusual thing about Selenium. Most web standards start as a specification and grow implementations. WebDriver grew the implementation first, shipped it to millions of users, and only then went to the W3C to be written down. By the time the Browser Testing and Tools Working Group published the WebDriver Recommendation in 2018, the protocol it described had already been the default way to drive a browser for the better part of a decade. This post follows that path from the JavaScript sandbox it was born inside to the cross-browser bidirectional protocol that is replacing it now.

The road runs through five waves: the JavaScript-injection era of Selenium Core, the proxy trick that produced Selenium RC, Simon Stewart’s WebDriver and the 2009 merger that fused the two, the long march to a W3C Recommendation, and the modern Selenium 4 line where the JSON Wire Protocol finally died and a WebSocket-based successor began. Each wave fixed the previous one’s central limitation, and each limitation came from the same place: the browser’s security model never wanted to be automated from the outside.

2004: a testing tool named after a poison antidote

The tool that became Selenium started life as something called JavaScriptTestRunner. Jason Huggins built it at ThoughtWorks in Chicago while working on an internal Time and Expenses application written in Python on Plone. Functional testing of that app was tedious and manual, and Huggins wanted a way to script a browser into clicking through the workflow and asserting on the result. He wrote it in the one language that already ran inside every browser: JavaScript.

That choice defined everything that followed for the next several years. Because the test runner was JavaScript loaded into the page, it could see and manipulate the DOM directly, which gave it immediate and intuitive visual feedback. Colleagues who saw the demo were taken with it. Huggins released it to open source near the end of 2004, with early help from Paul Gross and Jie Tina Wang. The name came from a joke. Huggins was needling a competing commercial test tool sold by Mercury Interactive, and he wrote in an email that mercury poisoning is treated with supplements of the element selenium. The name stuck.

Browser tab — origin: app.example.com Application under test DOM, forms, links Selenium Core JavaScript in the page click(), type(), assert() *Selenium Core ran as JavaScript loaded into the same page as the application, so it could touch the DOM directly. The price was that it lived entirely inside one origin.*

There were two problems baked into this design, and they would shape the project for a decade. The first was the same-origin policy. Selenium Core was JavaScript, and the browser refused to let JavaScript from one origin read or drive content from another. A test that needed to navigate from login.example.com to app.example.com hit a wall, because the script that started on one origin could not follow the page to the other. The second was that JavaScript inside a page cannot do everything a person can. It cannot reach outside the sandbox to handle a native file-upload dialog, a basic-auth prompt, or an SSL warning. Both problems came from the browser doing exactly what it was designed to do: isolate untrusted page scripts. Selenium was untrusted page script, by construction.

Selenium Core grew a recording layer that made it far more approachable. Shinya Kasatani in Japan wrapped the Core engine in a Firefox extension that let you record a session by clicking through the app and then play it back, exporting the steps as a test. That extension became Selenium IDE, and for years it was how most people first met the project. You did not have to write code to get started, which mattered enormously for adoption among manual QA teams. The IDE recorded the same Selenese command vocabulary that the rest of the project spoke, so a recorded test was not a black box: you could read it, edit the steps, and export them to a real programming language when the recorded version hit its limits. That export step was where most teams crossed from clicking to coding.

Selenese itself is worth a second look, because it set the conceptual vocabulary that survives in WebDriver today. A Selenese command was a verb plus up to two arguments, written as a table of command | target | value rows. open loaded a URL. click took a locator. type took a locator and a string. assertText checked that an element contained the expected text. The locator syntax let you address elements by id, by name, by XPath, or by a CSS selector, and that same set of location strategies carried straight through the merger into WebDriver’s using field. When you call find_element(By.CSS_SELECTOR, ...) in a modern Selenium script, you are using a locator model that traces back to the JavaScript table-runner of 2004.

2005: the proxy trick that became Selenium RC

The same-origin wall was the thing everyone wanted to climb. Paul Hammant, who had seen the 2004 demo, pushed for a “driven” mode where you could control Selenium from a language of your choice over the wire rather than writing the test in JavaScript. To make that work across origins, the team needed to convince the browser that the test harness and the application lived on the same domain. So they put a proxy in the middle.

The idea is almost a magic trick. You point the browser’s HTTP proxy setting at a Selenium server. That server fetches the real application, then serves it back to the browser under a fictional URL that also hosts Selenium Core. As far as the browser is concerned, the app and the automation script now share an origin, so the same-origin policy stops objecting. The proxy has effectively lied to the browser about where the page came from. Hammant wrote the original server in Java; Aslak Hellesoy and Obie Fernandez ported the client driver to Ruby, which set the pattern of one server plus many language clients that Selenium still follows.

Test code Java / Ruby / Py Selenium Server HTTP proxy + Core injection Browser app served under one fake origin Selenese over HTTP proxied page + JS The server masks the real app behind a single fictional URL so Selenium Core and the app appear same-origin. Commands ride in as "Selenese", a small verb vocabulary. *Selenium RC's proxy injection mode. The server impersonated the application's origin, which let injected JavaScript drive a page that otherwise lived on a different domain.*

The standalone server form arrived from an unexpected direction. At BEA Systems, Dan Fabulich and Nelson Sproul concluded that the original driver-to-browser architecture was awkward, so they forked the driver code into a self-contained server that bundled MortBay’s Jetty as the web proxy. When that work merged back, it became Selenium Remote Control, and the old “driven” codeline was retired. Pat Lightbody later hardened RC for enterprise use. Selenium RC, in retrospect, is Selenium 1, and for a few years it was the project’s flagship. You wrote tests in your language of choice, they spoke a small command vocabulary called Selenese to the server over HTTP, and the server translated that into JavaScript that Core executed in the page.

It worked, and it worked across more browsers than anything else at the time, which is why it won adoption. But the seams showed under load. Every command was JavaScript simulating a user action, so a click was a synthesized DOM event rather than a real click, and pages that distinguished the two could tell. Native dialogs were still out of reach. The proxy added latency and a configuration burden. And because the automation was script inside the page, it could not perfectly reproduce what a human driving the browser would produce. That last gap is exactly the gap that bot-detection systems would later learn to measure, and the lineage runs straight from here to today’s detection of automated browsers. Synthesized events have a different shape from real ones, and that difference never fully went away.

RC also gave the project the shape it still has: one server, many language clients. Because the test code talked to the server over HTTP, the client could be written in any language that could make an HTTP request. Java came first, then Ruby, then Python, C#, Perl, and PHP. Each client was a thin wrapper that turned method calls into Selenese commands and sent them to the server. This is why Selenium never belonged to one language community the way some test tools do. The server was the product; the clients were ports. That decision, made to get around a security policy, is the reason a Java shop and a Python shop could both standardize on the same browser-automation engine, and it is the reason the eventual W3C protocol had to be language-neutral by design rather than as an afterthought.

The third piece of the RC-era architecture was parallelism. A single Selenium server drove one browser at a time, which made a large test suite slow. In 2008 Philippe Hanrigou at ThoughtWorks built Selenium Grid, a hub-and-node design where a central hub received test requests and farmed them out to a pool of node machines, each running its own browser. A suite that took an hour on one machine could finish in minutes across twenty. Grid also let you fan out across browser and operating-system combinations: a node running Internet Explorer on Windows next to a node running Firefox on Linux, with the hub routing each test to a node that matched the requested capabilities. The capabilities-matching idea that Grid introduced, where a client asks for a browser by describing it and the infrastructure finds a match, became part of how WebDriver sessions are negotiated to this day.

2007: Simon Stewart writes WebDriver from the other direction

While RC was maturing, Simon Stewart at ThoughtWorks in Australia was building something with the opposite architecture. Instead of injecting JavaScript and accepting the sandbox, WebDriver controlled the browser from outside it, using whatever native automation hooks each browser exposed. His first public commit of WebDriver landed in early 2007. The bet was that talking to the browser through its own automation interfaces, rather than through page script, would produce more faithful and more capable control. A WebDriver click could be a real OS-level or browser-level click. A WebDriver navigation could cross origins freely, because it was not page script bound by the same-origin policy. Native dialogs came within reach.

The tradeoff ran the other way from RC. WebDriver needed a per-browser implementation, a binary or extension that knew how to drive that specific browser, which meant it initially supported fewer browsers than RC’s universal JavaScript approach. So by 2007 the open-source world had two browser-automation projects with complementary strengths and overlapping goals, both with active communities, both fixing the other’s weakness.

Huggins had left ThoughtWorks in 2007 and joined a then-quiet Selenium support effort inside Google, which had become a heavy user of the tool. Around the Google Test Automation Conference, the principals (Stewart, Huggins, and Pat Lightbody among them) started discussing whether the two projects should become one. The accounts differ slightly on timing; ThoughtWorks’s own retrospective places the first merger conversations at GTAC in 2007, while the public announcement to both communities came later. What is not in dispute is the shape of the deal. WebDriver’s architecture would be the future, and RC’s broad browser and language support would be folded into it.

2009: the merger and the JSON Wire Protocol

Simon Stewart sent the merger email to both communities in August 2009. His explanation of why was characteristically blunt about each side’s flaws. The merge was happening “partly because WebDriver addresses some shortcomings in Selenium (by being able to bypass the JS sandbox, for example. And we’ve got a gorgeous API), partly because Selenium addresses some shortcomings in WebDriver (such as supporting a broader range of browsers).” The combined project would be Selenium WebDriver, and the version number would jump to Selenium 2.0 to mark the architectural break.

That sentence is the hinge of the whole story. “Bypass the JS sandbox” is the entire reason WebDriver existed. Everything Selenium had been since 2004 was a clever way to live inside the sandbox; WebDriver’s pitch was to stop living inside it. The merger did not blend the two architectures so much as adopt WebDriver’s and keep RC’s reach. Selenium 2.0 shipped a WebDriverBackedSelenium shim so that existing RC tests kept working on top of the new engine, which gave the enormous installed base a migration path rather than a cliff.

Selenium 2.0 was released on 8 July 2011. The release notes and contemporaneous coverage describe a tool that drove Firefox, Internet Explorer, Chrome, Opera, the early mobile browsers on Android and iPhone, and the headless HtmlUnit engine, all behind one API. The thing that made this possible across so many backends was a wire protocol. To let a client written in any language talk to a driver for any browser, WebDriver defined an HTTP-based, JSON-bodied command protocol: the JSON Wire Protocol.

Language binding (client) Browser driver (server) POST /session { capabilities } 200 { sessionId } POST /session/:id/url { url } POST /session/:id/element { using, value } POST /session/:id/element/:el/click Request / response, one command at a time. No channel for the browser to speak first. *The JSON Wire Protocol modeled automation as REST-ish HTTP calls against a session. Every interaction is the client asking and the driver answering. The browser cannot push.*

The protocol modeled a browser session as a resource. You created one with POST /session and a capabilities object describing the browser you wanted. You got back a session id, and every later command carried that id in the path. Navigating was POST /session/:id/url. Finding an element returned an opaque element handle, and you clicked it by posting to a path containing that handle. The whole thing was request and response, one command at a time, which made it trivial to implement over plain HTTP in any language but left a structural hole: the browser had no way to talk back on its own. If you wanted to know when a page finished loading or when a network request fired, you polled. That hole is the reason WebDriver BiDi exists today, fifteen years later.

2012 to 2018: writing the standard down

By early 2012, Stewart, by then at Google, and David Burns at Mozilla were talking to the W3C about turning WebDriver into a real standard. This is the part of the story that runs backward compared to most web technology. The protocol already existed, was already deployed at scale, and already had implementations from multiple browser vendors who each shipped their own driver. The job was not to invent a protocol. It was to specify, precisely and unambiguously, the one that had grown in the wild, so that every browser’s driver would behave identically instead of merely similarly.

That mattered because the JSON Wire Protocol was a community document, not a rigorous spec, and the drivers had drifted. Edge cases in element visibility, in how clicks computed their target coordinates, in timeouts and error codes, all varied between ChromeDriver and the Firefox driver and IE’s driver. Writing a W3C Recommendation forced those behaviors to be pinned down to the level a conformance test suite could check. The W3C draft was published in 2012 and then spent six years being argued into precision by the Browser Testing and Tools Working Group, with Simon Stewart and David Burns as editors.

2004 Core 2005 RC 2007 WebDriver 2009 merger 2011 Se 2.0 2016 Se 3.0 2018 W3C Rec 2021 Se 4.0 *From an internal JavaScript test runner to a published web standard in fourteen years. The two orange marks are the structural breaks: the merger and the Recommendation.*

While that work ground on, the Selenium project shipped Selenium 3.0, announced on 4 October 2016. Selenium 3 did the unglamorous job of clearing out the old world. The original Selenium Core JavaScript implementation, the thing the whole project was named for, was deleted. The RC APIs were moved to a legacy package and re-backed by WebDriver underneath, so they kept compiling but no longer ran the old engine. Selenium 3 also marked the point where Firefox automation moved to Mozilla’s own driver: from Firefox 48 onward the community-built Firefox driver stopped working, and you needed geckodriver, Mozilla’s W3C-aligned implementation. The browser vendors were taking ownership of their own drivers, which is exactly what a real standard requires.

The WebDriver specification became a W3C Recommendation on 5 June 2018. The Recommendation’s own abstract describes it as a “platform- and language-neutral wire protocol” for the “remote control” of user agents, and it is candid about its lineage: the spec is “derived from the popular Selenium WebDriver browser automation framework,” using that framework’s behavior to inform the design. A working tool became a standard, rather than the other way around. From that point a browser could claim WebDriver conformance the way it claims HTML or CSS conformance, and a test written against the standard would run identically on any conforming browser. The same pattern of an implementation hardening into a cross-vendor standard shows up elsewhere in web infrastructure, from the long road of TLS to the evolution of HTTP itself.

2021: Selenium 4 and the death of the JSON Wire Protocol

Selenium 4.0 was announced by Simon Stewart on 13 October 2021, and it closed the loop that the W3C Recommendation opened. The headline change was that Selenium spoke only the W3C WebDriver protocol now. The old JSON Wire Protocol was gone from the client. For most users this was a “drop-in” upgrade because the two protocols are close cousins, but under the hood the dialect that the merged project had carried since 2011 was finally retired in favor of the standardized one. A decade after the protocol was created, the version it standardized into replaced it outright.

Selenium 4 added more than protocol cleanup. Relative locators let you find elements by spatial relationship, describing a target as above, below, or to the left of another element, which reads more like how a person describes a page. The release wired in the Chrome DevTools Protocol for Chromium and Firefox so that tests could intercept network requests, handle authentication prompts, and watch for DOM changes or JavaScript errors, capabilities the request-response WebDriver protocol could not offer on its own. And the Grid was rebuilt from scratch, taking lessons from community projects like Zalenium and Selenoid, able to run as a single process on one machine or fully distributed across a Kubernetes cluster, with a GraphQL-backed UI, live VNC previews of sessions, and OpenTelemetry tracing.

The CDP integration was a tell. Reaching into the Chrome DevTools Protocol to do the things WebDriver could not is the same move that Puppeteer made its whole foundation when Google shipped it in 2017. CDP is bidirectional and event-driven: the browser can push network events, console messages, and lifecycle notifications to the client without being polled. That is precisely the hole the JSON Wire Protocol left open in 2011. But CDP is Chromium-specific and unversioned, so leaning on it reintroduced the cross-browser fragmentation that WebDriver had spent a decade eliminating. A test that used Selenium 4’s CDP features worked on Chrome and broke everywhere else.

2024 onward: WebDriver BiDi closes the loop

The fix for that fragmentation is WebDriver BiDi, and it is the most interesting thing happening in browser automation right now. BiDi keeps the standardized, cross-browser philosophy of classic WebDriver but adds the one thing it always lacked: a bidirectional channel. Instead of a series of HTTP request-response calls, BiDi runs over a WebSocket, so the browser can push events to the client as they happen (network activity, log entries, JavaScript exceptions, DOM mutations) without the client polling for them. It is, deliberately, an attempt to get CDP’s event model inside a multi-vendor W3C specification. People describe it as the child of WebDriver Classic and the Chrome DevTools Protocol, and that is accurate.

The browser vendors are building it for real. Mozilla shipped Firefox support and used BiDi as the basis for official Puppeteer-on-Firefox starting with Firefox 129 in August 2024, and announced it would deprecate its CDP support in favor of BiDi. Chrome has been implementing BiDi alongside its CDP, with the stated direction of making cross-browser automation work through one standard channel. The protocol itself is still moving through the W3C process rather than finished, but the implementations are arriving ahead of the final spec, which is exactly the pattern that produced WebDriver the first time. Selenium 4 exposes BiDi APIs already, and the migration path off Selenium’s CDP-based features and onto BiDi is the live work in Selenium’s bidirectional protocol effort.

There is a quieter consequence here that matters to anyone on the other side of automation. Classic WebDriver advertises itself. A browser launched under WebDriver sets navigator.webdriver to true, a flag the standard requires, which is a free signal for any bot-detection system that wants to refuse automated traffic. The protocol was designed for testing your own site, where being detectable is a non-issue, so it never tried to hide. That honesty is why driving a browser through WebDriver and driving one to look human are different problems, and why the detection surface of each automation tool differs. WebDriver was built to be visible. The detection arms race grew up around tools that were not.

What the path actually shows

The throughline from 2004 to now is a browser security model that never wanted external control, and twenty years of attempts to get it anyway. Selenium Core lived inside the sandbox and accepted its limits. Selenium RC lied to the browser with a proxy to widen those limits. WebDriver stepped outside the sandbox entirely and paid for it with a per-browser implementation burden, which the W3C process eventually amortized across every vendor. WebDriver BiDi is now retrofitting the one capability the standard left out in 2011. Each step solved the previous step’s central constraint and inherited a new one. That is the normal way infrastructure matures, and it is worth remembering that none of it was planned. There was no roadmap in 2004 that ended at a W3C Recommendation. There was a tedious expense report and a developer who did not want to test it by hand.

The detail that has aged most strangely is the merger sentence. Stewart wrote in 2009 that WebDriver mattered because it could “bypass the JS sandbox,” and at the time that was an unalloyed good. It meant more reliable tests. Fifteen years on, “bypass the browser sandbox to drive it like a human” is also the exact one-line description of what a serious scraping or fraud operation does, and an entire industry exists to detect it. The tool that won the standards war did so precisely because it was better at not being page script, and being better at not being page script is now a thing systems test for. Selenium did not set out to be the most-detected automation framework on the web. It set out to test an expense app. The standard it left behind is the foundation everyone builds on, and the navigator.webdriver flag it is required to set is the first thing every defender checks.


Sources & further reading

  • Selenium project (n.d.), History — the project’s own account of Core, RC, IDE, Grid, and the people behind each, from 2004 onward.
  • Selenium project (n.d.), Selenium RC (Selenium 1) — legacy docs describing the proxy-injection architecture and the same-origin workaround.
  • SD Times (2009), Selenium 2.0 merges with WebDriver — contemporaneous coverage of Simon Stewart’s August 2009 merger announcement and the “bypass the JS sandbox” quote.
  • InfoQ (2011), Selenium 2 (a.k.a Selenium WebDriver) Is Released — reporting on the 8 July 2011 release and what WebDriver brought over RC.
  • Selenium project (2016), Selenium 3 is Coming — the 4 October 2016 announcement, deletion of Selenium Core, and the move to geckodriver.
  • W3C (2018), WebDriver — the W3C Recommendation, with editors Simon Stewart and David Burns and the statement that it derives from Selenium WebDriver.
  • W3C (2018), WebDriver motors on to W3C Recommendation — the 5 June 2018 announcement of Recommendation status.
  • Simon Stewart / Selenium project (2021), Announcing Selenium 4 — the 13 October 2021 release: W3C-only protocol, relative locators, CDP integration, and the rebuilt Grid.
  • W3C (draft), WebDriver BiDi — the in-progress bidirectional protocol specification.
  • Chrome for Developers (2023), WebDriver BiDi: The future of cross-browser automation — Google’s account of why a bidirectional, cross-browser standard was needed.
  • Thoughtworks (2014), Happy 10th Birthday, Selenium — a retrospective placing the first merger conversations at GTAC and naming the early contributors.
  • Wikipedia (n.d.), Selenium (software) — consolidated timeline of versions, dates, and contributors.

Further reading