Skip to content

The history of the user-agent string and why it's a frozen lie

· 19 min read
Copyright: MIT
The token Mozilla/5.0 rendered in large monospace with a single orange underline bar

Open the network panel on any browser shipping in 2026 and read the first token of the User-Agent header. Chrome says Mozilla/5.0. Firefox says Mozilla/5.0. Safari says Mozilla/5.0. Edge, Brave, Opera, Vivaldi, the in-app browser inside a dozen mobile apps: all of them open by claiming to be a piece of software that Netscape shipped in 1994 and that has not existed as a product since the dot-com crash. None of them are Mozilla. Every one of them says it is.

That single token is the oldest unbroken lie on the web. It survived two browser wars, a complete turnover of rendering engines, the death of Netscape, the rise and near-monopoly of Internet Explorer, the WebKit fork, and the Chromium era. It survived because removing it breaks websites that were written to check for it, and the set of those websites only ever grew. The story of the user-agent string is the story of how a field meant to identify software became a field that lies about identifying software, and how every attempt to clean it up ran into the same wall: too much code in the world already parses it, and parsers are unforgiving. This post walks that history from the original HTTP spec, through the Mosaic-Netscape-IE spoofing spiral, into the fingerprinting problem the string created, and out the other side to Chrome’s decision to freeze the string and replace it with something deliberately less informative.

What the field was supposed to be

The User-Agent header was specified in RFC 1945, the HTTP/1.0 document from May 1996, though browsers had been sending it for years before that was written down. The intent was modest. A client identifies itself with a list of product tokens, each an optional name with a slash and a version, plus free-text comments in parentheses. The spec offers User-Agent: CERN-LineMode/2.15 libwww/2.17b3 as its example, which is exactly the shape it wanted: a program, a version, the library underneath it. The current definition lives in RFC 9110 section 10.1.5 and keeps the same grammar. The header is “for statistical purposes, the tracing of protocol violations, and tailoring of responses to avoid particular user-agent limitations.” That last clause is the whole problem in nine words.

Tailoring responses to avoid user-agent limitations meant: look at who is asking, and if they cannot handle a feature, send them something simpler. In 1994 that was a reasonable thing to want. Browsers genuinely differed. One supported tables, another did not. One understood frames, another rendered them as a broken mess. A server that knew the client could degrade gracefully. The mechanism for knowing the client was the string, and the act of reading it to make decisions got a name that has outlived its usefulness by twenty-five years: user-agent sniffing.

The grammar itself is forgiving to the point of uselessness. Product tokens separated by spaces, comments in parentheses, no required order beyond “most significant first,” no registry of valid values, no schema. A server receiving the string has to guess what each token means. That worked when there were three browsers. It stopped working almost immediately.

Mosaic, and the browser that wanted to kill it

NCSA Mosaic came first in the popular sense, and it identified itself plainly. NCSA_Mosaic/2.0 (Windows 3.1). Product, version, platform in a comment. The header doing its job.

Then a chunk of the Mosaic team left to build something better and faster, and the codename they gave it was Mozilla, a contraction of “Mosaic killer” with a nod to Godzilla. The product shipped as Netscape Navigator, but the user-agent token kept the codename. Navigator 1.0 announced itself as Mozilla/1.0 (Win3.1). The first lie was small and almost innocent: the public product was Netscape, the string said Mozilla. But it set the precedent that the token in the user-agent field is a brand decision, not a fact about the software, and that precedent never reversed.

Navigator did things Mosaic could not. The one that mattered for the user-agent string was frames. Webmasters who wanted to use frames needed to know whether the visiting browser could render them, and the cheap test was to look at the user-agent string and check whether it said Mozilla. Mozilla got the framed, enhanced page. Everyone else, Mosaic and Cello and Samba and the rest, got a stripped-down fallback. This is the moment the string stopped being descriptive and started being a gate. If you wanted the good version of the web, your string had to contain the magic word.

User-Agent: does it contain "Mozilla"? yes -> frames, enhanced HTML the page the author actually built no -> bare-bones fallback HTML Mosaic, Cello, Samba, anything else *The 1994 content gate that made the Mozilla token mandatory. Once enough servers ran this check, no new browser could afford to omit the word.*

Around this period the string also carried a now-forgotten token. Netscape and Internet Explorer builds sold inside the United States included a letter indicating encryption strength, U for the 128-bit domestic build, I for the 40-bit export build, N for no encryption. You can still find ; U; sitting in old user-agent strings copied around the web. It was an artifact of US cryptographic export law, and it stopped mattering once those export restrictions on browser crypto were relaxed in the late 1990s. The letter lingered in the format long after it meant anything, which turns out to be the recurring theme of this entire field.

Internet Explorer pretends to be Netscape

Microsoft shipped Internet Explorer into a web that had already been gated on the Mozilla token. IE could render frames. IE wanted the enhanced pages. But the servers checking user-agent strings did not know what IE was, so they handed it the fallback meant for Mosaic. An honest user-agent string would have locked the new browser out of the best version of every site that sniffed.

So Microsoft did not send an honest string. Internet Explorer claimed to be Mozilla and tucked its real identity inside a comment: Mozilla/1.22 (compatible; MSIE 2.0; Windows 95). The naive sniffers, which only looked for the word Mozilla at the front, were satisfied and served the good page. The sophisticated sniffers that wanted to detect IE specifically could find MSIE in the comment. Everybody got what they wanted, and the user-agent field permanently lost the property that the leading token tells you what the browser is. From this point forward, Mozilla/ at the start of a string means nothing except “I am a web browser and I would like the page you give to web browsers.”

This is the pattern that repeats with mechanical regularity. A browser cannot get the good content unless its string resembles the dominant browser’s string, so it copies the dominant browser’s tokens and appends its own. The next browser copies that, and so on, each generation accreting the tokens of every browser before it. The string grows like a coral reef, every layer a fossil of a compatibility decision nobody can now safely reverse.

By the time IE reached version 4 and beyond and had taken the majority of the market, its string had hardened into a shape later browsers would themselves have to imitate: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1). Now the dominant browser was IE, and the same logic that pushed IE to mimic Netscape pushed the post-Netscape world to keep producing strings that satisfied IE-era sniffers.

Gecko, KHTML, and the “like Gecko” tax

Netscape lost the first browser war and open-sourced its engine. The Mozilla Foundation rebuilt the rendering engine from scratch as Gecko, and Firefox shipped a clean-looking string for the era: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041108 Firefox/1.0. The Mozilla/5.0 lead was now genuinely Mozilla, briefly, for the only browser family with any claim to the name. The rv: token gave the Gecko revision, the Gecko/ token gave a build date, and Firefox/ gave the product. Servers started sniffing for Gecko to decide whether a browser was a capable modern one.

Which set up the next round of copying. KDE’s Konqueror used a different engine, KHTML, and it was a perfectly capable modern browser, but servers sniffing for Gecko did not know that and served it the degraded path. Konqueror’s answer was the token that would propagate further than any other: it described itself as (KHTML, like Gecko). Not Gecko, but like Gecko, please treat me as such.

Then Apple forked KHTML into WebKit for Safari. WebKit wanted the pages that targeted KHTML and the pages that targeted Gecko, so Safari kept the whole inheritance and added its own: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; de-de) AppleWebKit/85.7 (KHTML, like Gecko) Safari/85.5. Read that string left to right and it is a complete archaeological record. It claims to be Mozilla (Netscape, 1994), notes it is like Gecko (Firefox, 2004) while actually being AppleWebKit (a fork of KHTML, which was Konqueror’s engine), and finally admits, at the end, that it is Safari.

One Safari string, five fossils: Mozilla/5.0 Netscape, 1994 -- not Mozilla (Macintosh; ...) platform comment AppleWebKit/85.7 the real engine, a KHTML fork (KHTML, like Gecko) Konqueror, then "please treat me as Firefox" Safari/85.5 the only token that is actually true Each token was added so a server's sniffer would serve the good page. None can be removed safely. *A modern WebKit string read as a stratigraphy. Every layer is a compatibility decision that some live server still depends on.*

Chrome arrived in 2008 on top of WebKit and inherited the entire stack rather than risk being served fallback content by the millions of sites that sniffed for Safari or WebKit. Its early string read Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.27 Safari/525.13. A brand-new browser from Google, claiming to be Netscape, claiming to be like Firefox’s engine, claiming to be Apple’s browser, and only then admitting to being Chrome. When Chrome later switched its engine from WebKit to its own Blink fork, it kept the AppleWebKit/537.36 and Safari/537.36 tokens anyway, frozen at the version number from the moment of the fork, because by then the entire web expected to see them. Those two numbers, 537.36, are now a permanent fixture of the Chrome string with no relationship to any software that ships. They are a checksum of the moment Blink split from WebKit, repeated on billions of requests a day. This is the same dynamic the history of web scraping ran into from the other side: clients copy whatever string gets served the real content, and the string stops describing anything.

The string became a fingerprint

Up to here the user-agent string is a compatibility hack with a funny history. The turn that made it a problem worth fixing came when people measured how much it gave away.

In 2010 the EFF ran Panopticlick, an experiment that collected the passive configuration a browser exposes and measured the uniqueness of the combination. The user-agent string alone carried, on average, about 10.5 bits of identifying information. Ten and a half bits is enough to single you out of a crowd of roughly 1,400 people. That sounds modest until you remember the string is sent on every request, requires no JavaScript, no cookie, no permission prompt, and survives clearing all your storage. It is passive fingerprinting in its purest form: the server learns it just by you connecting.

The reason a string can carry that much entropy is that it encodes a long tail of specifics. Not just Chrome, but Chrome 87.0.4280.88. Not just Windows, but Windows NT 10.0; Win64; x64. On Android it historically carried the full device model, which on a less common phone could be nearly unique by itself. Combine the exact browser build, the exact OS build, the CPU architecture, the device model, and the locale, and a surprising number of users were carrying a user-agent string that no other browser on Earth was sending that day. Fold that into the rest of a fingerprint, canvas, WebGL, fonts, audio, and the user-agent’s bits sit at the base of a stack that pushes total entropy past the point of uniqueness. The browser-side measurement of all this is the navigator object fingerprint; the user-agent header is its server-readable shadow, available before a single line of script runs.

Where the ~10.5 bits hide in a desktop Chrome string 87.0.4280.88 exact build Windows NT 10.0 OS + version Win64; x64 architecture Mozilla/5.0 zero bits, pure ritual On Android the device model was historically a fourth high-entropy field, near-unique on uncommon hardware. That is the field the reduction attacked first. Orange = identifying entropy. Grey = the legacy token that identifies no one. *The fingerprintable fields are the build numbers and device strings, not the Mozilla ritual. The reduction kept the ritual and zeroed the rest.*

There is an irony worth sitting with. The most useless token in the string, Mozilla/5.0, carries essentially zero entropy because everyone sends it. The fields that actually identify you are the honest ones: the real version, the real OS, the real device. The string’s accumulated lies are harmless to privacy; its remaining truths are the leak.

Freeze and reduce

Google’s Chrome team decided to stop sending the truths. In January 2020 a Chromium engineer posted an “Intent to Deprecate and Freeze: The User-Agent string” to the blink-dev list. The argument was the one above. The string is a passive fingerprinting vector, it is full of legacy values added for compatibility, and it encourages the sniffing that breaks minority browsers. The plan was to freeze the parts that leak, the version and OS and device details, and stop them from updating, then move any site that genuinely needed those details onto an opt-in replacement.

The original schedule was aggressive, with freezing pencilled in for mid-2020 builds. It did not survive contact with the year. COVID-19 disrupted Chrome’s release train in March 2020 and the freeze was postponed, which also gave the wider industry, advertisers and device-detection vendors and analytics firms who depended on parsing the string, time to push back on the timeline. The work resumed later and ran on a slow, multi-phase track instead.

The replacement is User-Agent Client Hints. Rather than blast every detail to every server on every request, the browser sends a small, low-entropy summary by default and reveals more only when a server explicitly asks. The default headers are Sec-CH-UA with the brand and major version, Sec-CH-UA-Mobile with a boolean, and Sec-CH-UA-Platform with the OS name. A server that needs the full version, the architecture, the bitness, the exact platform version, or the device model has to opt in by sending an Accept-CH response header naming the high-entropy hints it wants, at which point the browser will include them on subsequent requests. The Sec- prefix marks these as browser-controlled headers that page JavaScript cannot forge through fetch, and that do not trigger a CORS preflight. The full set, Sec-CH-UA-Arch, Sec-CH-UA-Bitness, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Model, Sec-CH-UA-Platform-Version and the rest, is defined in the WICG User-Agent Client Hints draft, edited at Google and, as of early 2026, still a Community Group draft rather than a ratified W3C standard.

Sent by default (low entropy) Sec-CH-UA Sec-CH-UA-Mobile Sec-CH-UA-Platform Only after Accept-CH opt-in (high entropy) Sec-CH-UA-Arch Sec-CH-UA-Bitness Sec-CH-UA-Model Sec-CH-UA-Platform-Version Sec-CH-UA-Full-Version-List server responds with Accept-CH naming the hints it wants The default request leaks little. Detail is a negotiation, logged and opt-in, not a broadcast. *Client Hints invert the model. The user-agent string broadcast everything to everyone; hints make the server ask, on the record, for each extra field.*

The Sec-CH-UA value also carries a deliberate piece of nonsense, and it is the most interesting design choice in the whole replacement. A Chrome value looks like "Chromium";v="110", "Google Chrome";v="110", "Not A;Brand";v="99". That third entry is fake on purpose. It is a GREASE brand, a randomly varying garbage token whose name, punctuation, and position rotate between builds. There is no browser called “Not A;Brand”. The point is to keep servers from writing brittle parsers that hardcode an exact set of brand strings, because a parser that only accepts the brands it has seen would lock out the next browser the instant it appeared, which is precisely the failure that gave us Mozilla/5.0 in the first place. GREASE is the user-agent string’s history encoded as a defense: the spec authors watched the web ossify around exact-match sniffing once, and built deliberate variability into the replacement so it cannot happen the same way twice. The mechanism is borrowed from TLS, where the same trick keeps middleboxes from hardcoding the set of valid handshake values, a story the TLS fingerprinting post tells from the other end.

For the legacy string that everyone still sends, the team froze the leaky fields rather than removing the header. After a staged rollout across 2022 and into 2023, a desktop Chrome user-agent now follows a fixed template, with the OS version and CPU details unified to constant values and the browser version reduced to a major number with the rest zeroed out: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/<major>.0.0.0 Safari/537.36. The minor, build, and patch numbers are gone, replaced by literal 0.0.0. On Android the change was sharper. The OS version froze at 10 and the device model collapsed to the single letter K, so a mountain of distinct device strings became one. If you see Android 10; K in a user-agent in 2026, it is almost certainly not an Android 10 device and almost certainly not a phone called K. It is a modern Chrome reporting the frozen placeholder. The reduction reached its final desktop and mobile shape with the Chrome 113 release in 2023, and an opt-out deprecation trial that let holdout sites keep the old string ran out at the end of May 2023.

What actually changed, and what didn’t

Notice what the freeze kept. Mozilla/5.0 is still there. AppleWebKit/537.36 is still there. (KHTML, like Gecko) is still there. Safari/537.36 is still there. Chrome reduced the parts of its string that identify you and left every fossil of the spoofing wars untouched, because those tokens carry no entropy and removing them would still break the long tail of sites that sniff for them. The reduction was surgical about privacy and completely surrendered on honesty. The 2026 Chrome string is shorter and less identifying than the 2019 version, and it is every bit as much a lie. It claims to be Netscape, claims to be like Firefox’s engine, claims to be Safari, and now also claims to be running on Android 10 on a device called K.

Firefox and Safari, meanwhile, never shipped Client Hints. As of 2026 the Sec-CH-UA family is a Chromium-only feature, which means a detection system cannot treat the presence of those headers as neutral. A request with no Client Hints at all is consistent with Firefox, with Safari, or with an HTTP client that does not know to send them, and the absence is itself a signal. The replacement that was meant to reduce fingerprinting surface added a new bit of its own: whether you send hints, and whether the hints you send agree with the user-agent string you also send. A client that copies a Chrome user-agent but forgets the matching Sec-CH-UA header has produced a contradiction that is cheap to check on the first request, the same class of tell as a mismatched Accept-header triad or a TLS fingerprint that does not match the browser the string claims to be.

Which is the quiet conclusion of thirty years of this. The user-agent string was meant to identify software, and it never could, because the first thing every browser learned was that telling the truth got it served the worse page. So they all lied, in the same direction, until the lie became a mandatory password with no information in it. When the privacy problem finally forced a cleanup, the cleanup removed the true parts and kept the false ones, because the false parts were load-bearing and the true parts were the leak. The header that opens billions of HTTP requests a day still begins with the name of a browser that died before most of its users were born, and the most modern thing about it, the GREASE brand bolted onto its replacement, exists for the sole purpose of making sure no future server can ever again trust what a browser calls itself. The web spent three decades proving that a self-reported identity field cannot be trusted, and then wrote that distrust into the spec.


Sources & further reading

Further reading