Skip to content

Fingerprint stability vs uniqueness: the entropy budget every detector balances

· 19 min read
Copyright: MIT
The words 'entropy budget' in large monospace type with an orange underline and the subtitle: more bits buy uniqueness and cost stability

A device fingerprint is only useful if it does two contradictory things at once. It has to be different from other devices, or it identifies nobody. And it has to be the same fingerprint tomorrow as today, or it identifies a different person every visit. Those two requirements pull against each other. Every signal a detector adds to widen the gap between devices also adds a value that can change on its own: a browser update, a new monitor, a font installed, a GPU driver bumped. The more discriminating a signal is, the more often it tends to drift. So a fingerprinting system is never trying to collect the most entropy it can. It is spending a budget, and the question is which bits are worth carrying.

This post is about that budget. Where the entropy number comes from, how Peter Eckersley turned “how unique is your browser” into a measurement in bits in 2010, why the headline uniqueness figures fell from 94 percent to 33 percent over the following decade, and how a system that wants a durable device ID engineers stability back in after the fact. The math is small. The tension it describes is the whole game.

What the entropy number actually measures

Start with the quantity everyone cites and almost nobody defines. When a paper says a fingerprint “carries 18.1 bits of entropy,” it is talking about Shannon entropy over the distribution of fingerprint values across a population. For a single fingerprint value, the relevant figure is the self-information, or surprisal: if a fraction P of the population shares your exact fingerprint, the surprisal is -log2(P) bits. The entropy of the whole distribution is the expected surprisal across all browsers, the standard H = -Σ P log2 P.

Eckersley used exactly this. In the 2010 How Unique Is Your Web Browser? study he defines the surprisal of a fingerprint output as I = -log2 P(fn), measured in bits because the log base is 2, and the entropy as the expected value of that surprisal over the population. The intuition he attaches is the one worth keeping: each bit of surprisal cuts the number of candidate browsers in half. A browser is uniquely recognisable, roughly, when its surprisal exceeds log2 of the population size. If a site is visited by a million browsers, that is about 20 bits. Reach 20 bits of surprisal and you are alone in the crowd of a million.

Each bit of surprisal halves the candidates that share your value population of 1,000,000 browsers 1M 31k 1k 31 8 2 1 0 10 15 ~20 bits of surprisal *Surprisal in bits versus the number of browsers still sharing your fingerprint. Around 20 bits is enough to be alone in a million-browser population. The accent bar is the point of uniqueness.*

Two cautions about the number, because they matter for everything that follows. First, entropy is a property of a population, not of you. The same fingerprint is rare or common depending on who else is being measured. A figure computed on the readers of a tech blog who came to test their uniqueness is not the figure for the general web. Eckersley said so plainly: his sample of privacy-conscious visitors was unrepresentative and his entropy numbers were lower bounds.

Second, surprisals from different signals do not simply add. You can sum two components’ bits only if the two are statistically independent, which fingerprint signals almost never are. Knowing the screen resolution tells you something about the platform; knowing the platform tells you something about the font list. Eckersley handled this with conditional self-information, -log2 P(a | b), rather than pretending the bits stack. The practical effect is that a system listing “40 signals” is not collecting forty independent measurements. There is heavy correlation, and the joint entropy is far below the sum of the parts.

2010: Panopticlick fixes the measurement

The reason the 2010 study is the reference point is that it was the first to take a folk belief (“browsers are probably identifiable”) and put a number on it with a real population. Code at panopticlick.eff.org collected fingerprints from 470,161 browsers between late January and mid-February 2010. Each fingerprint combined eight measurements available at the time: the user-agent string, the HTTP Accept headers, whether cookies were enabled, screen resolution and colour depth, timezone offset, the browser plugin list, the system font list (enumerated through Flash or Java), and a set of supercookie tests.

The distribution carried at least 18.1 bits of entropy. Among browsers with Flash or Java installed, which exposed the font and plugin lists, the average rose to 18.8 bits and 94.2 percent were instantaneously unique. Across the whole sample, including the locked-down browsers, 83.6 percent were unique. The single most discriminating signal was not the user-agent. It was the plugin list, at 15.4 bits in isolation, followed by fonts at 13.9 bits. The user-agent came in around 10 bits. The least useful was the cookies-enabled flag at 0.35 bits, which is the information content of a near-constant: almost everyone has cookies on, so learning that you do tells a tracker almost nothing.

Per-signal entropy in isolation (Panopticlick, 2010) bits of surprisal, each signal measured alone plugins fonts user agent http accept video / screen timezone supercookies cookies enabled 15.4 13.9 10.0 6.09 4.83 3.04 2.12 0.353 *The plugin and font lists dominated the 2010 fingerprint. Both were later removed or frozen by browsers, which is why the headline uniqueness number fell. The orange bar is the single most identifying signal, and it no longer exists.*

The study also did the thing that makes it relevant to this post. It measured stability. Among 8,833 cookie-accepting users who returned more than 24 hours after their first visit, 37.4 percent showed at least one fingerprint change. That is a high churn rate, and on first reading it looks like protection: if more than a third of fingerprints mutate within a day, the fingerprint cannot be a durable identifier. Then the paper closes the door. A simple heuristic that assumed browser family and OS stay constant, and that versions only increase, could link a changed fingerprint back to its previous value with 99.1 percent of guesses correct and a false positive rate under one percent. The fingerprint changed; the device stayed linkable. That result is the seed of every modern stability engineering effort, and the rest of this post is about what grew from it.

Why more signals means less stability

Here is the mechanism stated plainly. Entropy comes from a signal taking many distinct values across the population. A signal that takes many values is, almost by construction, a signal with many states the same device can pass through. A monitor resolution distinguishes you from someone on a different display, and it changes when you dock a laptop. A font list distinguishes you finely, and it changes when an application installs a font pack. The GPU renderer string in WebGL is one of the strongest single signals available, and it changes on a driver update. The correlation is not perfect, but the direction is reliable: the signals that buy the most bits are disproportionately the signals that drift.

The clean counterexamples prove the rule. The most stable signals are the ones with the least entropy. Hardware concurrency (the CPU core count exposed to JavaScript) almost never changes for a given machine, and it is nearly worthless on its own because cores cluster on a handful of values. Timezone is stable for most people most of the time and carries only a few bits. The cookies enabled flag is rock-solid and worth a third of a bit. Stability and entropy are not strictly inversely proportional, but a detector picking signals finds them sitting on opposite ends of a trade.

So a fingerprint built to maximise uniqueness is fragile by definition. Eckersley says as much when he notes that combining a single high-entropy signal with the rest pushes most desktop browsers past the uniqueness threshold while leaving them exposed to change. The later large-scale work confirmed it from the other direction: when a fingerprint is not unique, it tends to be one feature away from becoming unique, which means it is also one feature change away from looking like a brand-new device. There is no free lunch where a signal is both highly discriminating and perfectly durable. If such a signal existed, fingerprinting would be trivial and this would not be an interesting problem.

This is why the practical systems split their signals by stability tier. The open-source FingerprintJS library exposes the idea directly, separating signals that essentially never change from signals that change infrequently, and computing a visitor identifier from a hashed combination of them rather than from the raw maximum-entropy bundle. The library’s own documentation is candid that browser-side fingerprints are lower accuracy than a server-assisted system and can be spoofed, which is the honest way to put it: the open library is a stability-weighted hash, not a guaranteed identity. The deeper mechanics of that split are in the post on FingerprintJS internals.

2016 to 2018: the headline number falls

Two later studies are worth holding side by side, because together they show the budget shrinking. In 2016 the AmIUnique work collected 17 attributes from 118,934 browsers and reported 90 percent of desktop and 81 percent of mobile fingerprints unique, while adding canvas and WebGL rendering as new high-entropy signals that HTML5 had made available. Canvas in particular became a workhorse, because rendering the same text and shapes to a hidden canvas produces pixel-level differences driven by the GPU, driver, and font stack. The detail of why a single toDataURL call is so discriminating is its own subject, covered in canvas fingerprinting, and the GPU-string story is in WebGL fingerprinting.

Then in 2018 the picture changed. The Hiding in the Crowd study collected 2,067,942 fingerprints from one of the top French websites, and found only 33.6 percent unique. That is not a contradiction of the earlier numbers. It is what happens when you measure a representative population instead of self-selected privacy testers, and when the single biggest signal has started to disappear. By 2018 browser plugins were on their way out, NPAPI was dead, Flash was dying, and the 15-bit plugin list that anchored the 2010 fingerprint was collapsing toward a constant. Remove the dominant signal and the long tail of unique browsers shortens.

The 2018 work introduced the idea the rest of this post leans on: the anonymity set. An anonymity set is the group of browsers sharing identical values across every collected attribute. A set of size one is a unique, trackable browser. A larger set is a crowd to hide in. The study found a sharp desktop-mobile split. Mobile devices, with their constrained and homogeneous hardware and software, landed in large anonymity sets far more often, with roughly 59 percent of mobile fingerprints in sets larger than 50, against only about 8 percent on desktop. Mobile is harder to fingerprint not because mobile hides better but because a million identical iPhones running the same iOS build genuinely look the same. The diversity that powers desktop fingerprinting is missing.

Anonymity set: same values across every collected attribute set size 1, unique trackable large set, hidden in the crowd mutually indistinguishable *A unique fingerprint is a set of size one. Mobile devices land in large sets far more often than desktops because their hardware and software are homogeneous. The detector's enemy is the large grey crowd.*

By 2023 a US study that surveyed and fingerprinted around 8,400 participants found roughly 60 percent of users had a unique overall fingerprint, with the authors noting that the once-dominant plugin list now returns a hardcoded value in Chromium and that the user-agent and platform have been deliberately trimmed. The trajectory across the four data points (94 percent, then 90/81, then 33.6, then about 60) is not a clean monotone decline, because the populations and signal sets differ. What it shows is that the browser vendors have been actively spending down the entropy that fingerprinting relied on, removing the highest-entropy signals first, and that fingerprinters have responded by reaching for new ones.

Engineering stability back in

If the highest-entropy fingerprint is fragile, and a durable device ID is what a detector actually wants, then the interesting engineering is not in collecting more bits. It is in recovering identity across the drift. There are a few standard moves, and they all trade a little uniqueness for a lot of stability.

The first is signal weighting by tier. Rather than hash all signals into one brittle value, a system groups them: signals that change essentially never (hardware concurrency, device memory, platform), signals that change occasionally (timezone, language, screen geometry), and signals that change often (browser version, certain canvas outputs after a driver update). The durable ID is built primarily from the stable tier, with the volatile tier used as corroboration or as a tiebreaker rather than as a load-bearing component. This is the logic behind FingerprintJS exposing distinct stability levels, where the most stable level deliberately drops volatile signals to keep the identifier constant across updates, accepting that two genuinely different devices are now slightly more likely to collide.

The second move is fuzzy matching instead of exact equality. A naive fingerprint is a hash, and a single changed bit produces a completely different hash, so a browser update reads as a new device. A production matcher does not compare hashes. It compares the underlying signal vector and asks whether the new observation is close enough, under rules that encode how signals are allowed to change. The rules in the 2018 FP-Stalker work are a clean example: a browser’s family must stay constant, the OS must stay constant, and the version number may stay equal or increase but never decrease. A fingerprint that matches an old one on everything except a bumped Chrome version is the same device on a Tuesday update, not a new one.

Linking a fingerprint that drifted stored fingerprint os = macOS browser = Chrome ver = 131 canvas = a1f9c new observation os = macOS browser = Chrome ver = 132 canvas = a1f9c stable fields match, version only increased verdict: same device, linked *A rule-based matcher accepts a drifted observation as the same device when the immutable fields hold and the version moves the only direction it is allowed to. The orange field is the change the rules tolerate.*

The FP-Stalker numbers show how much this buys. Sampling a fingerprint every few days, the linking algorithm tracked a device for an average of around 51 days, and tracked roughly a quarter of devices for more than 100 days, despite the constant churn. The fingerprint was never stable. The identity reconstructed from it was. That gap between signal stability and identity stability is the whole reason fingerprinting survives browser updates, and it is why “my fingerprint changes all the time” is not the defence it sounds like.

The third move pushes the stable signals server-side, beyond the browser’s reach. A client-only fingerprint is collected in JavaScript that the device can inspect and lie to. Signals observed by the server (the TLS ClientHello shape, the HTTP/2 settings, the TCP/IP stack characteristics) are not under the page script’s control and tend to be stable per client stack. They carry their own entropy and their own stability profile, and they cannot be patched away by a stealth plugin running in the page. The TLS side of this is its own deep topic in TLS fingerprinting from JA3 to JA4, and the commercial fingerprinting products lean on exactly this server-plus-client combination to push accuracy past what a browser-only library can promise.

When the defender spends the budget down

The entropy budget is a lever for whoever holds it, and browser vendors have spent the last decade pulling it the other way. The interesting cases are the two opposite anti-fingerprinting strategies, because they correspond to two different things you can do to the distribution.

The Tor Browser strategy is uniformity. Make every Tor user look identical by forcing the high-entropy signals into shared buckets: a default font bundle so font enumeration returns the same list, canvas and WebGL access blocked or prompted, the content window rounded to fixed steps, the user-agent and headers normalised. The goal is to collapse the anonymity set so that millions of users share one fingerprint. In entropy terms, uniformity drives the per-user surprisal toward zero by making P(fingerprint) large for everyone. The cost is that any user who deviates from the uniform configuration stands out sharply, so the strategy only works if you do not touch your browser.

The randomisation strategy does the opposite. Instead of making everyone the same, it makes each request different, adding noise to canvas readback or shuffling values so the fingerprint is unstable from session to session. This attacks stability rather than uniqueness. A fingerprint that is different every time cannot be linked across visits even if it is unique within a visit. The risk, well known to the people building these defences, is that the randomisation can itself become a signal: a canvas that returns a slightly different value on every read is not a normal canvas, and the very instability flags the browser as defended. The W3C fingerprinting guidance discusses both and is honest that neither is free; it frames the design goal as minimising the entropy a new web feature adds, clamping precision and using coarse enumerated categories rather than fine-grained values, so the surface never grows in the first place.

The User-Agent Client Hints work is the cleanest example of a vendor spending the budget down by design. The old user-agent string leaked OS version, device model, and full browser version to every server passively, around ten bits in 2010. Chrome’s reduction froze most of that string and split the data into low-entropy hints sent by default (Sec-CH-UA brands, Sec-CH-UA-Mobile, Sec-CH-UA-Platform) and high-entropy hints that a site must explicitly request (Sec-CH-UA-Full-Version-List, Sec-CH-UA-Arch, Sec-CH-UA-Model, Sec-CH-UA-Bitness, Sec-CH-UA-Platform-Version). The low-entropy set is what flows for free; the identifying detail requires an opt-in the user agent can see and meter. On top of that sits GREASE, where the browser injects a randomised brand entry with a varying name and version into Sec-CH-UA and shuffles ordering, so a server cannot rely on an exact brand-list match. That is deliberate noise added to a signal precisely to make it less reliable as a fingerprint, the same randomisation logic applied at the protocol level.

The budget is the whole story

Strip away the specific signals and the picture is a single constrained optimisation. A fingerprint wants high joint entropy, because that is uniqueness, and high temporal stability, because that is identity, and the signals available pay out one mostly at the expense of the other. The plugin list of 2010 was the dream signal, 15 bits and reasonably stable, and the browsers killed it. What replaced it (canvas, WebGL, audio, client hints) buys bits at a worse exchange rate, because the high-entropy versions drift on driver and version updates and the stable versions have been deliberately coarsened. A detector does not respond by collecting more. It responds by spending smarter: weighting the durable signals, matching fuzzily across the volatile ones, and reaching for the server-observable stack that the page cannot edit.

The number to hold onto is not 18.1 bits. It is the shape of the trade. Every time a browser vendor removes a signal or buckets it or greases it, the uniqueness side of the budget shrinks and the fingerprinter leans harder on stability engineering to keep identities linked across a population that now looks more alike. The headline uniqueness figure has fallen and will keep falling. The linking has not, because a fingerprint that changes 37 percent of the time can still be followed with 99 percent accuracy by a heuristic that knows which way versions move. Fingerprint stability and fingerprint uniqueness are the two axes, and the only thing a detector ever really controls is how it spends across them.


Sources & further reading

Further reading