Device fingerprinting in anti-bot stacks: FingerprintJS, the entropy budget, and stability
A visitor lands on a page with no cookies, a fresh IP, and a clean local storage. The site has never seen this browser before, or so the browser believes. Within a few hundred milliseconds a script has read the canvas rendering, the audio stack, the installed fonts, the screen geometry, the timezone, and a few dozen other attributes the browser hands out for free. It concatenates them, hashes the result, and gets back a string. If that same browser comes back next week with cookies cleared and a new IP, the string is a good bet to be the same. That is device fingerprinting, and it is the part of an anti-bot stack that works when every other identifier has been wiped.
The question this post tries to answer is narrow and practical. How much identifying information does a browser actually leak, where does it come from, and why does a hash computed entirely in JavaScript drift over time while a commercial product can keep the same identifier stable for months? FingerprintJS is the worked example throughout, because its open-source library is the most widely read implementation of the technique and its commercial sibling, Fingerprint Pro, is the clearest public case of how the server-side version closes the gaps the client-side one cannot.
The sections below start with the entropy budget, the information-theoretic idea that the whole field rests on. Then the FingerprintJS signal set, source by source, and how the visitor ID gets computed. Then the stability problem, which is where the open-source and Pro versions diverge hard. Then the server-side payload as measured by independent researchers, the fuzzy matching that fixes drift, and finally the line between fingerprinting and bot detection, which people conflate constantly and which the same vendor happens to sell as two separate libraries.
The entropy budget
Browser fingerprinting as a measured discipline starts with one paper. In 2010, Peter Eckersley of the EFF ran the Panopticlick experiment, collected fingerprints from 470,161 browsers, and published the result as How Unique Is Your Web Browser?. The headline number is still the one everyone quotes. The fingerprint distribution carried at least 18.1 bits of entropy, which means that if you pick a browser at random, you expect only one in 286,777 others to share its fingerprint. Among browsers that had Flash or a Java VM enabled, 94.2% were unique outright.
The mechanism is information theory applied to identity. Eckersley borrowed the term surprisal, the self-information of a single outcome, defined as the negative base-2 logarithm of that outcome’s probability. A signal that every browser shares carries zero bits. A signal that splits the population in half carries one bit. The entropy of the whole distribution is the expected surprisal across all browsers. Bits add up. Get enough weakly identifying signals that are statistically independent and their entropy sums until the population of browsers sharing your exact combination shrinks to one. That summation is the entropy budget. Every fingerprinting library is in the business of spending it.
The per-signal contributions from the Panopticlick data are worth sitting with, because the shape has barely changed in fifteen years even though the specific signals have.
*Per-signal entropy from Eckersley's 2010 measurements. Plugins and fonts dominated then. Plugins have since collapsed to near zero as browsers stopped exposing them, which is exactly why canvas and audio readouts took over.*Two things about that chart matter for everything that follows. First, the signals do not add cleanly, because they correlate. A browser reporting a particular plugin list also tends to report a particular user agent, so the joint entropy is less than the sum of the parts. Second, the biggest contributors from 2010, plugins and the font list pulled through Flash, are mostly gone. Browsers froze the navigator.plugins array, Flash died, and the user agent got deliberately reduced. The entropy budget did not shrink to match. It moved. The signals that replaced the old ones are the rendering-based ones, canvas and audio and WebGL, which is the story the Mowery and Shacham paper told two years after Panopticlick.
In 2012, Keaton Mowery and Hovav Shacham of UC San Diego published Pixel Perfect: Fingerprinting Canvas in HTML5 at the Web 2.0 Security and Privacy workshop. They showed that asking a browser to render text and WebGL scenes into a <canvas> element, then reading the pixels back, produced a fingerprint that was, in their words, consistent, high-entropy, orthogonal to other fingerprints, transparent to the user, and readily obtainable. The entropy was modest on its own. Across 294 browser instances they saw 116 distinct fingerprints from text rendering, about 5.73 bits. The valuable word in their list is orthogonal. Canvas entropy is mostly independent of screen resolution and the JavaScript capability signals, so it adds to the budget rather than overlapping it. That independence is why every serious fingerprinting library since has a canvas source, and why the open-source FingerprintJS ships two canvas images by default.
The reason a canvas readback varies between machines at all is worth understanding, because it is the same reason canvas is hard to defend. When a browser rasterizes text or a 3D scene, the exact bytes that land in the pixel buffer depend on the GPU model, the graphics driver version, the anti-aliasing and sub-pixel rendering settings, the installed fonts the browser falls back to, and the floating-point behavior of the rendering pipeline. None of those is something the page asks for directly. They are emergent properties of the hardware and software stack underneath the browser, surfaced through an API that was designed for drawing, not identification. Two machines running the identical browser version on identical operating systems can still differ in their canvas output if they have different GPUs. That is the source of the entropy, and it is also why the signal is stable across browser updates: a software update does not swap your graphics card.
The entropy budget has a shape that changes over time, and the practical lesson is that a fingerprinting library is never finished. Signals decay as browsers patch them and new signals appear as browsers ship new APIs. WebGPU is the clearest current example: it exposes adapter and limit information that, like WebGL before it, leaks the GPU and driver stack, and it arrived after most of the academic literature was written. A library that froze its source list in 2018 would be measurably worse today, not because the old signals stopped working but because the population shifted around them. This is why reading the open-source source tree is the right way to take the current pulse of the field. It is a living list, updated as the browsers move.
The FingerprintJS signal set
FingerprintJS is an MIT-licensed TypeScript library, currently at v5.2.0 (released April 2026), that does exactly what the entropy budget suggests. It queries a fixed list of browser attributes, normalizes them, and hashes them into a visitor identifier. The current source tree has the collection sources broken out into individual files, and reading their names is the fastest way to understand what the library actually spends its budget on.
*The default source list from the FingerprintJS source tree, grouped by what they read. The categories are mine; the source names are verbatim from the library's `src/sources` directory.*A few of these reward a closer look because they are where the entropy now lives. The canvas source renders two images by default. The library’s own docs note that the geometry-only image is more stable while the text image gives more entropy, and both ship by default. The dom_blockers source is a clever one: it injects elements with class names that ad and tracker blockers are known to hide, then checks which ones disappeared, turning the user’s choice of blocker and filter list into a few bits of signal. The math source reads the exact floating-point output of transcendental functions like acos and tan, which vary subtly across CPU and libm implementations. The screen_frame source reads the offsets between the screen and the available screen area, which encode taskbar and dock placement.
What you do not see in that list is anything network-level. No TLS data, no HTTP/2 frame ordering, no IP. The open-source library runs entirely in the page’s JavaScript context, so it can only read what the DOM and the various Web APIs expose. That boundary is the single most important fact about the open-source version, and it is the reason the Pro version exists. For the network side of the same arms race, the companion pieces on TLS fingerprinting from JA3 to JA4 and JavaScript runtime fingerprinting cover signals the in-page library cannot reach.
From sources to a visitor ID
Once every source has returned its value, the library assembles a stable component map, serializes it, and runs the serialized form through a 128-bit MurmurHash variant (the x64 128-bit construction inherited from the older fingerprintjs2 x64hash128). The output is a hexadecimal string, the visitorId. MurmurHash is not cryptographic and is not meant to be. It is a fast non-cryptographic hash chosen because the goal is a stable, well-distributed identifier, not a tamper-proof one. The whole identifier lives and dies on the inputs.
That last point is the crux of the open-source library’s limitation and the library’s own documentation is candid about it. Because fingerprints are generated and processed in the browser, they are, in the project’s own words, vulnerable to spoofing and reverse engineering, and the accuracy is significantly lower than the commercial version. The reason is structural. A 128-bit hash demands an exact match across all inputs. Change one source’s output by one bit and the hash changes completely. That is fine for uniqueness. It is fatal for stability.
The stability problem
Here is the tension at the heart of device fingerprinting. You want the identifier to be unique, which argues for collecting as many high-entropy signals as possible. You also want it to be stable across days and weeks, so the same device produces the same identifier when it returns. Those two goals fight, because the high-entropy signals are exactly the ones most likely to change.
Eckersley measured this in 2010 and the result was sobering even then. Among the 8,833 users who accepted cookies and returned more than once over 24 hours, 37.4% showed at least one fingerprint change. A browser update bumps the user agent and the WebGL renderer string. A new monitor changes the screen resolution and color depth. A font install shifts the font list. Each of these flips one or more sources, and because the visitor ID is an exact hash of all sources, any single flip produces a brand new ID. The client-side hash has no notion of a small change. Every change is total.
Fingerprint’s own marketing comparison puts numbers on the decay. In their illustration, the open-source FingerprintJS correctly identifies 60,000 devices at 30 days but only 12,960 at 120 days, while Fingerprint Pro holds 99,500 at 30 days and 98,015 at 120 days. The open-source ID lifetime is described as several weeks; Pro’s as months to years. Treat the exact figures as vendor-supplied, but the shape is real and consistent with the academic literature on fingerprint drift. A pure client-side hash decays fast.
*The structural reason for the gap. A 128-bit exact hash has no tolerance for partial change. A server-side model can hold an identifier steady while the underlying signals drift, because it never required an exact match in the first place.*The fix is to stop treating the identifier as an exact hash. That requires moving the decision off the client and keeping state. Which is what Pro does.
What moves server-side
Fingerprint Pro keeps the same idea (read browser signals) and changes two things. It collects more signals, including ones the browser cannot see on its own, and it makes the identification decision on a server that remembers every browser it has ever seen. The client-side payload is no longer hashed into the final answer in the browser. It is shipped to a backend, validated, and matched against a state database using a machine-learning model. The visitor ID comes back from the server.
The most rigorous public account of what that payload looks like is a 2026 academic paper, Understanding Server-side Commercial Fingerprinting, by Elisa Luo, Tom Ritter, Stefan Savage, and Geoffrey Voelker, presented at the Web Conference (WWW 2026). The authors instrumented several commercial fingerprinting services, including Fingerprint Pro, and reverse-engineered the payloads. Their description of the Pro payload is the closest thing to a primary source on its contents. It carries screen attributes (colorDepth, colorGamut, contrast, hdr, height, width, resolution), capability flags (cookiesEnabled, domBlockers, forcedColors, indexedDB, invertedColors, localStorage, monochrome, openDatabase, pdfViewerEnabled, reducedMotion, sessionStorage, touchSupport, privateClickMeasurement), navigator fields (cpuClass, deviceMemory, hardwareConcurrency, languages, osCpu, platform, vendor), and a WebGL block including renderer, vendor, and the unmasked vendor and renderer strings. Several values, canvas readouts in particular, are hashed before transmission. The authors note canvas readouts are hashed prior to inclusion for all of the services they studied, likely to keep the payload small.
That signal list overlaps heavily with the open-source source files, which is the point. The client-side collection is broadly the same. What differs is everything that happens after the POST. The payload goes to the server as JSON, with varying degrees of obfuscation and minification across vendors, and the server does the work. According to the same paper, Fingerprint Pro adds signals that are entirely invisible to the in-page script, server-observed network attributes among them, and it detects incognito modes, virtualization, VPNs, and privacy-focused settings such as Brave’s defenses. It also offers an IP reputation score and a bot detection system built on the open-source BotD library. Those extra detections are sold as Smart Signals and, per Fingerprint’s own comparison page, the device, network, and behavior signals are gated to the Pro Plus and Enterprise tiers.
Fuzzy matching and the confidence score
The state database is what makes drift survivable. When a payload arrives, the server does not demand an exact match. It finds the closest previously-seen device and decides whether this is the same one. The Luo paper measured this directly with synthetic fingerprints and the result is the clearest public evidence that Pro tolerates partial change. The authors generated 2,000 visits with unique synthetic fingerprints, 1,000 Chrome-on-macOS and 1,000 Chrome-on-Windows, each verified to be unique under the client-side model by confirming a unique open-source FingerprintJS hash. Pro still collapsed many of them together. Only 66.7% of the macOS fingerprints and 76.2% of the Windows fingerprints received unique Pro visitor IDs. The rest landed in cohorts, where multiple distinct client-side hashes mapped to a single Pro VID. The most common cohort size was two: a pair of fingerprints with different exact hashes sharing one Pro identifier.
That is fuzzy matching working as designed, and it is a double-edged result. The same tolerance that keeps your identifier stable when your browser updates also means two genuinely different devices can occasionally collapse into one identity. The vendor exposes this uncertainty as a confidence score, a number from 0 to 1 that the documentation defines as confidenceScore = 1 - falsePositiveProbability. Closer to 1 means higher certainty that the returned visitor ID is correct. The intended use is thresholding. Below some cutoff you escalate, with a step-up such as 2FA or a CAPTCHA, rather than trusting the ID outright. The model leans on stable signals where it can; the paper notes that stable IP addresses act as implicit supervisory signals and that mobile devices, which expose properties that stay put longer, identify more accurately than desktop browsers.
There is a subtlety worth stating plainly. The exact weighting the server-side model applies to each signal, the structure of the state database, and the precise matching algorithm are not public. What is public is the payload contents (from the Luo paper and from the open-source sources), the confidence-score formula (from the docs), and the measured behavior of the matcher (from the synthetic-fingerprint experiment). The internal model is inferred from that observed behavior, not from any published spec. Anyone telling you the precise feature weights is guessing.
Fingerprinting is not bot detection
People in the anti-bot space use device fingerprinting and bot detection interchangeably and the conflation causes real confusion, so it is worth being precise. They answer different questions. Fingerprinting asks: is this the same device I saw before? Bot detection asks: is this device a human or an automated agent? You can have a perfectly stable fingerprint for a bot, and a human whose fingerprint changes every visit. The two signals are orthogonal, which is exactly why Fingerprint ships them as two separate libraries.
The fingerprinting library is FingerprintJS. The bot detection library is BotD, also MIT-licensed and also free. BotD runs entirely in the browser and looks for the tells of automation rather than the identity of a device. It detects headless Chrome and headless Firefox, and automation frameworks including Selenium, Playwright, PhantomJS, Nightmare, Electron, and SlimerJS, plus tools like Browserless and undetected-chromedriver. Where FingerprintJS reads canvas and audio to compute an identity, BotD reads inconsistencies, a navigator that claims one OS while another signal says something else, a languages array that does not match the rest of the environment, the artifacts that the Chrome DevTools Protocol leaves behind. Fingerprint Pro folds BotD’s verdicts into its Smart Signals so customers get both answers from one integration, but the underlying questions stay separate.
This matters for an anti-bot stack because the two get combined differently depending on the goal. A fraud team deduplicating accounts cares about the fingerprint, the stable identity, and treats the bot verdict as one more risk factor. A scraping-defense team cares about the bot verdict first and uses the fingerprint to track an adversary across IP rotations. The same FingerprintJS visitor ID that links a returning customer also links a returning scraper who cleared cookies and switched proxies, which is why fingerprinting shows up in scraping defenses at all. For how that decision gets split between the browser and the origin, the piece on server-side versus client-side bot detection is the companion to this one, and the reCAPTCHA Enterprise writeup covers a vendor that fuses both into a single risk score.
What the defenses actually do
The browsers that try to resist fingerprinting take two broadly different routes, and both show up in the FingerprintJS source as edge cases the library has to handle. The first route is uniformity, making every instance of the browser look identical so the anonymity set is the entire user base. The Tor Browser is the canonical example, and Firefox’s resistFingerprinting mode follows the same philosophy: normalize the timezone, round the screen and window dimensions, freeze the user agent, and return a uniform value from the canvas and audio APIs so the readback carries no entropy. If it works, every protected browser reports the same fingerprint, and a fingerprint that everyone shares identifies no one.
The second route is randomization, which Brave calls farbling. Rather than returning a constant, Brave perturbs the output of fingerprinting-prone APIs with a small amount of per-session noise, so the canvas readback, the audio samples, and the enumerated fonts differ slightly each time. The effect is to poison the signal: the fingerprint is no longer stable across sessions, so it cannot be used to link visits. The Luo paper notes that Fingerprint Pro explicitly detects privacy-focused settings, Brave’s defenses among them, which tells you the randomization is detectable as randomization even when the underlying values are obscured. A signal that changes every session in a characteristic way is itself a signal.
Chrome, the dominant browser, does neither. Through 2024 and 2025 Google wound down its Privacy Sandbox effort and shipped no fingerprinting-specific mitigation for the canvas, WebGL, WebGPU, AudioContext, or font vectors. Security researchers have been blunt that Chrome lacks the defenses Firefox and Brave have shipped for years. For the majority of real traffic, then, the entropy budget is fully available to anyone who wants to read it, which is the practical reason device fingerprinting remains a load-bearing part of anti-bot stacks rather than a fading one. The defenses exist; most users are not running them.
Where the entropy is going
The signals that carry the entropy budget keep shifting and the direction is consistent. The high-entropy, low-stability vectors win. Plugins are gone. The user agent has been deliberately frozen and reduced across browsers, with the structured userAgentData replacing the old string in Chromium. Into that vacuum stepped the rendering-based signals, canvas first, then WebGL and audio, and increasingly WebGPU, because they read the actual GPU and driver stack and that stack is both unique and slow to change. A browser update will not change your GPU. The open-source FingerprintJS source tree reflects this directly: the rendering and hardware sources are where the modern budget concentrates, while the locale and capability flags are mostly there to break ties.
The context around all of this changed in 2025 in a way that makes fingerprinting more relevant rather than less. Google abandoned its third-party-cookie deprecation in July 2024, then retired the Privacy Sandbox replacement APIs through 2025, shipping no fingerprinting-specific mitigation. Third-party cookies remain fully operational in Chrome with no removal timeline, and security researchers have pointed out repeatedly that Chrome ships without defenses against the canvas, WebGL, WebGPU, AudioContext, and font vectors that Firefox’s resistFingerprinting and Brave’s farbling actively counter. The entropy is still there for the taking, and the dominant browser is not trying to take it away.
What the entropy budget actually buys
Strip away the product tiers and the field reduces to one trade. You spend bits of entropy to gain uniqueness, and uniqueness fights stability, because the bits that make a browser unique are the bits most likely to change. The open-source FingerprintJS resolves the trade by ignoring it: it hashes everything exactly, accepts a unique-but-fragile identifier, and tells you in its own docs that the ID lasts several weeks. Pro resolves it by moving the decision to a stateful server that remembers every device, tolerates drift through fuzzy matching, and reports its own uncertainty as a confidence score. The Luo measurements put a hard number on the cost of that tolerance: roughly a third of synthetic macOS fingerprints that were unique on the client collapsed into shared identities on the server. Stability is bought with collisions, and the confidence score is where the vendor admits it.
The honest summary for anyone reasoning about an anti-bot stack is that the client-side library is a readable, accurate map of which browser signals carry information, and a poor model of how a production system actually identifies a returning device. The production system keeps state, and state is the one thing a 128-bit hash computed in a browser can never have. Everything else, the canvas readback, the audio latency, the font probe, the WebGL renderer string, has been public and largely unchanged for over a decade. The interesting part was never the signal list. It is what happens to the payload after it leaves the page.
Sources & further reading
- Peter Eckersley (2010), How Unique Is Your Web Browser? — the Panopticlick study; 470,161 browsers, 18.1 bits of entropy, the per-signal surprisal table, and the original definition of fingerprinting entropy.
- Keaton Mowery and Hovav Shacham (2012), Pixel Perfect: Fingerprinting Canvas in HTML5 — the paper that introduced canvas fingerprinting and showed its entropy is orthogonal to older signals.
- Elisa Luo, Tom Ritter, Stefan Savage, Geoffrey Voelker (2026), Understanding Server-side Commercial Fingerprinting — WWW 2026 reverse-engineering of Fingerprint Pro and peers; payload contents, hashed canvas readouts, and the synthetic-fingerprint cohort measurements.
- FingerprintJS (2026), fingerprintjs/fingerprintjs (GitHub) — the MIT-licensed open-source library, v5.2.0; README’s accuracy caveats and the MurmurHash visitorId construction.
- FingerprintJS, Default entropy sources (src/sources) — the verbatim list of collection sources used throughout this post.
- FingerprintJS, Extending the agent (docs) — documents the canvas geometry-vs-text image trade and how to exclude components.
- Fingerprint (2026), Fingerprint Pro vs. FingerprintJS — vendor comparison; ID lifetime, decay illustration, and the Smart Signals tiering for bot, device, and network signals.
- Fingerprint, Identification, accuracy, and confidence score (docs) — the
confidenceScore = 1 - falsePositiveProbabilitydefinition and the deterministic-vs-probabilistic distinction. - FingerprintJS, BotD (GitHub) — the separate MIT-licensed bot-detection library; the automation frameworks it flags and how it differs from fingerprinting.
- The Register (2026), Google Chrome lacks browser fingerprinting defenses — the 2026 state of Chrome’s missing canvas, WebGL, WebGPU, AudioContext, and font protections.
- Center for Democracy and Technology (2025), Google’s Privacy Sandbox is Dead — the retirement of the Privacy Sandbox APIs and what it means for fingerprinting’s continued relevance.
Further reading
Timezone, locale, and the Intl API as a geolocation cross-check
Traces how Intl.DateTimeFormat, getTimezoneOffset, Accept-Language and navigator.languages get read together against IP geolocation, and how the gaps between them catch proxies and spoofed browsers.
·19 min readFingerprintJS internals: the open-source signals vs the commercial Pro entropy
A source-level read of the open-source FingerprintJS agent: its entropy sources, how x64hash128 turns them into a visitorId, the confidence formula, and what Fingerprint Pro adds server-side with Smart Signals and bot detection.
·21 min readDataDome's detection model: every signal it collects on the first request
Traces what DataDome evaluates on the very first request, before any JavaScript runs: the TLS/JA4 fingerprint, the HTTP/2 frame profile, the header set, and IP and ASN reputation, and how those signals stack into one decision.
·19 min read