Canvas fingerprinting: how a single toDataURL call identifies a device
Draw the sentence “How quickly daft jumping zebras vex.” onto a hidden <canvas>, read the pixels back, hash them, and you get a short string. Run it again on the same machine and you get the same string, pixel for pixel. Run it on a different machine, even one with the same browser version and the same operating system, and the string often changes. The text looks identical to a human. The bytes underneath do not. That gap, between what the eye sees and what toDataURL returns, is the whole trick.
The mechanism is almost embarrassingly small. There is no exploit, no permission prompt, no plugin. A page that can run JavaScript can render a few words to an offscreen surface and ask the browser for the resulting bitmap, and the browser hands it over in a fraction of a second with no visible sign that anything happened. The value it gets back is a fingerprint of the graphics stack underneath: the GPU, the driver, the font rasterizer, the anti-aliasing settings, the installed fonts. This post traces where that fingerprint comes from, why the pixels differ, how much it actually identifies you, and what the browsers that tried to kill it have managed to do.
A roadmap
We start with the 2012 paper that named the technique, then walk the actual API calls a script makes. After that we get into the physics of why two machines render the same glyph differently, which is the part that matters and the part most explainers skip. Then the entropy question: how many bits is this worth, and against what population. Then the 2014 measurement that found it deployed on real sites, the move from plain text to emoji and geometry that modern collectors made, and finally the defenses, the randomization arms race, and where the whole thing sits in 2026.
The 2012 origin
The technique has a clear birth. In May 2012, Keaton Mowery and Hovav Shacham, both at UC San Diego, published “Pixel Perfect: Fingerprinting Canvas in HTML5.” Their observation was simple and, in hindsight, obvious. Browsers had started pushing rendering down into the operating system and the GPU for speed. Text went through the host font rasterizer. 3D went through WebGL and the graphics driver. The moment a browser leans on the machine underneath it for rendering, the output starts to depend on that machine, and that dependence is a fingerprint.
They listed five properties that made it dangerous. It is consistent: independent runs on one machine give pixel-identical results. It is high-entropy. It is orthogonal to other fingerprints, because it measures the graphics driver and GPU model, which are independent of things like screen resolution or the plugin list. It is transparent to the user, running offscreen in a fraction of a second with no visual cue. And it is readily obtainable, needing nothing beyond ordinary JavaScript. Those five claims have held up for more than a decade.
The paper described two flavors. One renders text to a 2D canvas context and reads the pixels. The other renders a WebGL scene, in their case 200 polygons approximating a hyperbolic paraboloid with a directional light, textured with a rasterized ISO 12233 resolution chart, and reads that. The text variant turned out to be the one that spread, because it works without WebGL, runs faster, and survives in environments where WebGL is disabled. The WebGL variant grew into its own discipline, which is the subject of a separate post on WebGL fingerprinting. This post is about the 2D canvas.
One detail from the paper is worth keeping, because it explains the choice of test string. They rendered a pangram, “How quickly daft jumping zebras vex.”, specifically to exercise as many distinct letterforms as possible in one short line. A pangram packs every letter into a sentence, so a single render touches a wide spread of glyph shapes, and any rasterizer quirk that affects one letter has a chance to show up. That instinct, pick text that stresses the renderer, carries straight through to the collectors running today.
What the script actually does
Strip the technique to its bones and it is four steps. Get a canvas context. Draw to it. Read it back. Hash the result.
The drawing is plain 2D canvas API. Set a font, set a fill color, call fillText with a string at some coordinates. The original paper’s example is three lines:
const ctx = canvas.getContext("2d");ctx.font = "18pt Arial";ctx.textBaseline = "top";ctx.fillText("Hello, user.", 2, 2);The read-back is where the fingerprint leaves the canvas. There are two methods, and the paper used both for different reasons. getImageData(x, y, w, h) returns an ImageData object holding the raw RGBA integer values for every pixel in a rectangle. toDataURL("image/png") returns a data URL, a Base64 string holding a PNG of the entire canvas. The collector hashes whichever one it took. Because the render is deterministic on a given machine, the hash is stable across runs, and because it varies across machines, the hash is a label.
Two guards matter here, and they shape what a collector can do. First, toDataURL and getImageData honor the same-origin policy. If you draw an image from a different origin onto the canvas, the canvas becomes tainted and both read-back methods throw a SecurityError instead of returning pixels. So a fingerprinting canvas can only contain resources the page controls: its own text, its own shapes, fonts it loaded itself. That is not a real obstacle, because the technique needs nothing cross-origin. It draws its own text.
Second, the read-back is the thing browsers learned to gate. The same-origin policy was never meant to stop a page from reading its own canvas, so the early defenses bolted on a new gate: prompt the user, or quietly perturb the output, when a script reads back a canvas it drew. More on that later. The point for now is that the privileged operation is toDataURL/getImageData, not the drawing, which is why every defense and every detector watches those two calls.
Why the pixels differ
This is the part that earns the technique its place, and it is worth slowing down on, because “the GPU is different” is true but does not explain the half of it. Two machines with the same GPU model still diverge. The reasons stack up across the whole rendering path.
Start with the font rasterizer. When you ask a canvas to draw “How quickly daft jumping zebras vex.” in 18pt Arial, the browser does not have a bitmap of that sentence lying around. It has an outline font, a set of vector contours for each glyph, and it has to convert those contours into pixels at the requested size. That conversion is rasterization, and it is full of decisions. Where does the edge of a curved stroke fall when it lands between two pixels? How much of each boundary pixel gets filled? That is anti-aliasing, and different rasterizers compute the coverage differently. The Mowery and Shacham data showed this directly: across 300 samples of the same Arial pangram, they found 50 distinct renderings. Same font, same nominal size, fifty different bitmaps.
Then there is hinting, the process of nudging glyph outlines onto the pixel grid so stems stay crisp at small sizes. On Windows the historically dominant path was ClearType, which uses the physical layout of red, green and blue subpixels on an LCD to gain horizontal resolution, which is why a difference map of two Windows renders of the same text often lights up at the colored fringes of letter edges. The paper noted exactly this: same-platform difference maps traced the outline of each letter, a signature of anti-aliasing or subpixel hinting, and the authors pointed out that a fingerprinter could read display or ClearType settings out of it. The user’s monitor and its subpixel geometry leak into the bitmap.
Kerning and metrics add another axis. The same text rendered by different stacks comes out at different total widths. The paper’s clearest cross-platform difference was kerning: the sentence rendered at two different lengths depending on platform, so even before you look at individual pixels, the bounding box of the text is informative. A collector that measures where the text ends is reading a coarse version of the same signal.
Font substitution is the last big one, and it is sneaky. Ask for a font the machine does not have, or give the canvas a deliberately invalid font string, and the browser falls back to something else. The fallback choice, and the metrics and shape of the substitute, depend on what is installed and on the OS font-matching logic. The paper’s text_nonsense test set the font to “not even a font spec in the slightest” precisely to probe this fallback behavior, and noted that by probing for fonts this way a fingerprinter can derive a fairly complete list of installed fonts. Some of their Linux samples were not rendering Arial at all but a similar substitute, which split those machines cleanly from the Windows and Mac ones.
Above the font path sits the GPU and its driver. Modern browsers composite the canvas, and increasingly accelerate 2D drawing operations, on the graphics hardware. That introduces driver-specific rounding in how colors blend and how shapes get rasterized, which is why collectors moved to drawing overlapping translucent shapes: the blend math in the overlap region is a property of the GPU and driver, not the font. The paper anticipated this, quoting a 2010 WebGL mailing-list exchange where Steve Baker bet he could identify most hardware by reading back GPU results, and Benoit Jacob noted that most rendering in the upcoming browser generation would go through the GPU and be subject to GPU and driver rendering differences. They were right.
And the browser version sits on top of all of it. The rasterizer, the default hinting, the way fonts are matched, the canvas color management, all of these change between browser releases. So the canvas hash is not a stable hardware serial number. It is a fingerprint of a particular software-plus-hardware configuration at a particular moment, and it drifts when any of those change. That drift is both the technique’s weakness and the reason a collector pairs it with other signals rather than trusting it alone.
How much entropy is it really worth
Here is where the honest version diverges from the scary version. The canvas fingerprint is strong, but the headline numbers people quote are easy to misread.
The original paper measured 5.73 bits of sample entropy: 294 Mechanical Turk experiments produced 116 unique fingerprint values. Five and a half bits is not enormous on its own; it splits a population into something like 50-ish distinguishable buckets. But the authors were careful about what that number meant. Their sample was lopsided, heavily Windows 7 and Chrome, with little variation in browser and OS, so the entropy they measured was a floor, not a ceiling. In a population with real diversity of hardware and software, the canvas would carry more.
The more important property is the one the paper called orthogonality. The canvas measures the graphics and font stack, which is largely independent of the other things a fingerprinter already knows, like the Accept-header triad or the screen resolution. Independent bits add. Five or six bits that overlap with what you already have are nearly worthless; five or six bits that are orthogonal roughly multiply the number of buckets you can tell apart. That is why canvas earns its place in a stack even though, alone, it is not decisive.
Later large-scale studies put canvas near the top of the signal list. The 2018 “Hiding in the Crowd” study by Gómez-Boix and colleagues, run on a population of more than two million fingerprints from a French commercial site, found canvas among the most distinctive attributes available, with normalized entropy in roughly the 0.35 to 0.42 range, alongside the plugin list, the user agent, and the available fonts. That study is also the cold-water study, and it deserves its weight. On their large and more representative population, only about a third of fingerprints were unique overall, far below the 80-to-90 percent figures from the earlier Panopticlick and AmIUnique datasets, and uniqueness on mobile collapsed. The reason is that mobile devices are homogeneous: many identical phones running an identical stack render the canvas identically, so the canvas stops separating them.
*Uniqueness depends heavily on the population sampled. Early opt-in studies skewed toward unusual, identifiable browsers; the larger 2018 study did not.*The lesson is not that canvas is weak. It is that uniqueness is a property of the crowd, not of the technique. A canvas hash that is shared by ten thousand identical iPhones is useless for telling them apart and very useful for telling that group from everyone else. A canvas hash that is shared by nobody else pins one rare machine exactly. Same technique, opposite outcomes, decided by who else is in the dataset.
Found in the wild
For two years canvas fingerprinting was a paper. Then in 2014 a team from Princeton and KU Leuven measured the actual web in “The Web Never Forgets,” presented at ACM CCS that year, and found the technique deployed for real. Their crawl of the top 100,000 sites found canvas fingerprinting on more than 5% of them. The bulk of that deployment traced to a single third party, the social-sharing widget provider AddThis, which had begun experimenting with canvas fingerprinting early in 2014. The widget was embedded across a large slice of the web, so one script reached a lot of sites.
That paper changed how the technique was perceived, because it moved canvas fingerprinting from a clever attack to a documented tracking mechanism in active use, and it triggered the response. AddThis stopped after the press coverage. Browser vendors and standards bodies started treating canvas read-back as something to guard. The arms race that followed is the rest of this story.
It is worth noting what the 2014 script actually drew, because it set the template. Reporting on the AddThis script described it rendering a specific phrase, the perfect-pangram style string “Cwm fjordbank glyphs vext quiz”, in a way meant to maximize rendered variety. The choice was the same instinct as the 2012 paper: pick text that stresses the rasterizer, then read the pixels.
From plain text to emoji and geometry
The modern collector renders more than a pangram. Look at an open-source reference like the canvas source in FingerprintJS and you can see how the technique matured, because the code is public and commented.
The current approach splits the work into two separate images, and the split is deliberate. One image is the text fingerprint. The library draws a string built to stress the rasterizer, “Cwm fjordbank gly” followed by a grinning-face emoji, in a mix of fonts and sizes, with overlapping translucent fills in colors like #f60, #069, and a semi-transparent green rgba(102, 204, 0, 0.2). The emoji is the interesting addition. Color emoji are rendered from large, OS-specific emoji fonts, and those differ sharply across platforms: Apple Color Emoji on macOS and iOS, Segoe UI Emoji on Windows, Noto Color Emoji on many Linux builds. A single emoji glyph drags in which platform you are on and which version of that emoji font is installed, which is a lot of signal for one character.
The second image is the geometry fingerprint, and it touches the GPU rather than the font stack. It draws overlapping circles in magenta, cyan and yellow with the canvas blend mode set to multiply via globalCompositeOperation, so the overlap regions are products of the source colors computed by the graphics pipeline. The blend math, and the rounding in it, depends on the GPU and driver. There is also a winding-rule shape, an arc filled with the evenodd rule, plus a capability probe using isPointInPath(x, y, 'evenodd') to record whether the browser even supports that rule. The split between the two images exists for a practical reason the project learned the hard way: the text image is less stable than the geometry image, because text rendering shifts with small environment changes and even with emoji-position quirks, so keeping them separate prevents an unstable text render from poisoning the more stable geometry signal.
This is also where canvas stopped being a privacy curiosity and became a bot-detection signal. The same hash that lets an ad network track a person lets an anti-bot system notice an automated browser. A headless Chrome on a Linux server in a datacenter has a canvas hash drawn from a software renderer or a virtual GPU, which looks nothing like the hash from a consumer machine with a real discrete GPU. A canvas value that says “Linux software rasterizer” attached to a User-Agent that claims “Chrome on Windows” is a contradiction, and contradictions across signals are exactly what detection vendors hunt for. That is why canvas appears in the telemetry every major vendor collects, alongside the broader JavaScript runtime fingerprint, and why anti-detect browsers spend so much effort spoofing it.
The defenses, and the randomization arms race
There are two honest ways to defend a canvas, and the browsers split along that line.
The first is to refuse, or to ask. The Tor Browser took this route early: it treats a canvas read-back as a request that needs consent, notifying the user when a script tries to extract image data and offering to return blank data instead. Firefox in its resist-fingerprinting mode does something similar, gating canvas read-back behind a permission so a site cannot silently pull pixels. The cost is friction and breakage. Plenty of legitimate pages read their own canvas for real reasons, so a hard prompt is a blunt instrument, which is why this approach stayed mostly inside privacy-focused configurations rather than shipping on by default to everyone.
The second is to lie, quietly. Instead of blocking the read, perturb it so the value is no longer a stable identifier. Brave calls its version farbling: it adds subtle, deterministic randomization to fingerprinting surfaces including canvas, seeded per session and per origin, so the same site reading the same canvas in the same session gets a consistent value, but that value differs across sessions and across sites and no longer pins the device. Brave credited the academic PriVaricator and FPRandom projects as precedent and described it as the first time that randomization approach shipped in a mainstream browser. The appeal is that nothing breaks: a page still gets a canvas, still gets pixels, just slightly different ones each time.
Randomization is elegant, and it has a weakness that researchers have since pressed hard. If the noise is random, you can average it out. A 2025 paper presented at The Web Conference, “Breaking the Shield,” analyzed canvas defenses in the wild and showed that the deployed randomization schemes, Brave’s among them, could be attacked. The core idea is statistical: read the canvas many times, and the underlying true rendering shows through the noise like a signal recovered from many noisy samples, by majority vote or by modeling the noise distribution. Brave acknowledged the pressure and moved to limit how many canvas read-backs a page can perform between user interactions, precisely to deny an attacker the many samples that averaging needs. So the defense and the attack are now tuned against each other on the same axis: the defender adds noise and caps the number of reads, the attacker collects reads and votes the noise away.
There is also a subtler problem with randomization that is easy to miss. Noise is itself a signal. A canvas value that is internally inconsistent, or that changes between two reads within one session when it should not, tells a detector that it is looking at a browser that perturbs canvas, which is a small population. A defense meant to hide you in the crowd can mark you as a member of the much smaller crowd of people running that defense. The 2018 large-scale study made the general point that homogeneity protects and oddity exposes, and a noisy canvas is, in this narrow sense, an oddity. Apple took the more conservative line in Safari, which limits canvas and other surfaces and, in private and locked-down modes, leans toward returning a uniform value rather than a per-user random one, on the theory that everyone looking the same beats everyone looking uniquely random.
Where it stands in 2026
Canvas fingerprinting is fourteen years old and still works, which is the most interesting thing about it. The mechanism Mowery and Shacham described in 2012 is unchanged in its essentials: draw, read back, hash. What changed is everything around it. The text grew an emoji. The single image split into a text half and a GPU half. The browsers that cared added prompts or noise, the noise got attacked, and the attack got countered by rate-limiting the read. None of that touched the core, because the core is not a bug to be patched. It is a direct consequence of browsers rendering through the operating system and the GPU for speed, and no one is going to give up that speed.
The honest summary is that canvas is a strong but contingent signal. It carries real, orthogonal bits, more than the 5.7 the first paper measured on a homogeneous sample, but its power is entirely a function of who else shares your render. On a rare desktop with an unusual font set and a discrete GPU it can pin you almost alone. On a current iPhone it puts you in a crowd of millions and tells a tracker almost nothing on its own, which is why it is never used on its own. It is one input to a scoring stack that also reads the network layer, the TLS handshake, and the rest of the JavaScript runtime, and its job there is partly to catch the contradiction: a canvas that says one machine paired with a User-Agent that claims another.
For anyone building a browser meant to blend in, the canvas is one of the hardest surfaces to fake convincingly, because faking it well means reproducing a coherent font-and-GPU stack, not just returning a plausible-looking hash. A static spoofed value is itself detectable, because real canvas hashes drift a little as the environment changes and a frozen one does not. The deepest irony is the one in the 2018 data: the most effective canvas defense ever measured was not a clever algorithm at all. It was a billion people buying the same phone.
Sources & further reading
- Mowery, K. and Shacham, H. (2012), Pixel Perfect: Fingerprinting Canvas in HTML5 — the originating paper; defines the text and WebGL canvas fingerprint, reports 5.73 bits of entropy and pixel-stable repeatability.
- Acar, G., Eubank, C., Englehardt, S., Juarez, M., Narayanan, A. and Diaz, C. (2014), The Web Never Forgets: Persistent Tracking Mechanisms in the Wild — first large-scale measurement; found canvas fingerprinting on more than 5% of the top 100,000 sites, much of it via AddThis.
- Wikipedia, Canvas fingerprinting — overview of the technique, its 2012 origin, the 2014 deployment, and browser mitigations.
- Wikipedia, Device fingerprint — canvas as one signal among many, with entropy figures and the major measurement studies.
- Gómez-Boix, A., Laperdrix, P. and Baudry, B. (2018), Hiding in the Crowd: an Analysis of the Effectiveness of Browser Fingerprinting at Large Scale — two-million-fingerprint study; ranks canvas among the most distinctive attributes while showing overall uniqueness far lower than earlier work, especially on mobile.
- FingerprintJS, canvas.ts source — a public, commented reference implementation; shows the text-plus-emoji image, the multiply-blended geometry image, and the winding-rule probe.
- Brave (2020), Fingerprinting Protections: Randomization — describes farbling, Brave’s per-session, per-origin canvas randomization, citing PriVaricator and FPRandom.
- Brave (2020), Fingerprinting Defenses 2.0 — follow-up on the randomization approach and its scope across canvas, WebGL and Web Audio.
- Laperdrix, P. et al. / Nguyen, H. et al. (2025), Breaking the Shield: Analyzing and Attacking Canvas Fingerprinting Defenses in the Wild — shows deployed randomization defenses can be statistically defeated by repeated read-backs and voting.
- Laperdrix, P., Rudametkin, W. and Baudry, B. (2019), Browser Fingerprinting: A Survey — survey placing canvas in the wider set of fingerprinting vectors and defenses.
- Eckersley, P. / EFF, Panopticlick — the original browser-uniqueness experiment whose entropy framing the later canvas work built on.
Frequently asked questions
Why do two computers with the same browser and OS produce different canvas fingerprints?
The difference comes from the whole rendering path, not just the GPU. The font rasterizer decides how curved edges fall between pixels and how anti-aliasing fills boundary pixels, hinting nudges glyphs onto the pixel grid using settings like ClearType and the monitor's subpixel layout, and font substitution and kerning change metrics and widths. The GPU and driver add rounding when blending colors. Each layer can diverge, so the bitmap looks identical but hashes differently.
What stops a canvas fingerprinting script from reading pixels, and why isn't it an obstacle?
Two guards apply. The same-origin policy taints a canvas if you draw a cross-origin image onto it, making toDataURL and getImageData throw a SecurityError. This does not block fingerprinting because the technique draws its own text and shapes and needs nothing cross-origin. The second guard is the read-back itself, which browsers learned to gate with prompts or perturbation, which is why every detector and defense watches toDataURL and getImageData rather than the drawing.
How much entropy does a canvas fingerprint actually carry?
The 2012 paper measured 5.73 bits from 294 experiments yielding 116 unique values, splitting a population into roughly 50 buckets. The authors treated that as a floor because their sample was lopsided toward Windows 7 and Chrome. What matters more is orthogonality: canvas measures the graphics and font stack, largely independent of signals like screen resolution, so its bits add rather than overlap. Its real power depends entirely on who else shares your render.
Why do modern collectors render emoji and overlapping shapes instead of just text?
Modern collectors split the canvas into two images. The text image stresses the font rasterizer and adds an emoji, because color emoji come from large OS-specific fonts that differ sharply across Apple, Windows and Linux, leaking platform and font version from one character. The geometry image draws overlapping circles blended with the multiply composite operation, so the overlap math probes the GPU and driver. They are kept separate so an unstable text render does not poison the more stable geometry signal.
How did browsers try to defend against canvas fingerprinting, and how were those defenses attacked?
Two approaches emerged. Tor Browser and Firefox in resist-fingerprinting mode gate read-back behind consent or return blank data, but the friction kept this mostly in privacy-focused configurations. Brave instead perturbs the value, called farbling, adding per-session, per-origin randomization. A 2025 paper showed randomization can be defeated statistically by reading the canvas many times and voting away the noise, so Brave moved to limit read-backs between user interactions to deny attackers the needed samples.
Further reading
WebGL fingerprinting: the renderer string, precision, and shader quirks
A primary-source reference on WebGL fingerprinting: the UNMASKED_RENDERER and UNMASKED_VENDOR strings, supported extensions, shader precision formats, rendered-image hashing, and the browser mitigations that bucket or hide them.
·24 min readAudioContext fingerprinting: the OscillatorNode signature explained
Traces how rendering an oscillator through OfflineAudioContext and a DynamicsCompressor produces a stable per-device float, the floating-point and FFT causes behind the variation, the 2016 origin, and how much entropy it really carries.
·18 min readFont fingerprinting: enumeration, measurement, and the @font-face side channel
Traces how the installed-font set became a high-entropy fingerprint, the text-width and ClientRects measurement that reads it without any font API, the @font-face/local() side channel, and the browser defenses that tried to close it.
·18 min read