Media capabilities and codec support as a device fingerprint
Ask a browser a simple question. Can you play this video? Not “will you let me,” not “is this allowed,” just the mechanical question of whether the bytes that arrive in a particular codec, at a particular resolution, with a particular DRM wrapper, can be turned into pixels and sound on this machine. The browser answers honestly, because it has to. A streaming site needs to know whether to send AV1 or fall back to H.264 before it wastes bandwidth on a stream the device cannot decode. So the platform hands the page four different ways to ask, and each one returns a precise, machine-shaped answer.
That answer is a fingerprint. The set of codecs a device admits to decoding, the resolutions at which it claims smooth playback, the DRM systems whose modules load, and the security levels those modules report, together describe the hardware video block, the OS media framework, the browser build, and sometimes the exact GPU. None of it requires a permission prompt. Most of it is stable for the life of an OS install. This post is about that surface: the four APIs that expose it, why the answers differ across otherwise-identical-looking browsers, how much it narrows a device, and what the two browsers that took it seriously have done to blunt it.
A roadmap
We start with the four APIs, because they are not interchangeable and the differences matter. HTMLMediaElement.canPlayType is the old ternary, MediaSource.isTypeSupported is the boolean that streaming actually uses, MediaCapabilities.decodingInfo is the modern call that adds the smooth and power-efficient bits, and requestMediaKeySystemAccess is the EME probe that reaches all the way down to the DRM module. Then the mechanism: why a codec query returns different bytes on macOS than on a Linux server pretending to be macOS, which is a story about hardware decoders and the platform media framework underneath the browser. Then DRM, where the surface gets sharper and, in the Widevine case, has been shown to leak a stable device identifier outright. Then the entropy question, the move from research curiosity to a standard line in every commercial fingerprinting SDK, and the defenses that Firefox and the Tor Browser ship today.
Four ways to ask the same question
The oldest of the four is canPlayType, defined on every <audio> and <video> element. You hand it a MIME type, optionally with a codecs parameter, and it returns one of three strings: "probably", "maybe", or the empty string. The ternary is deliberate. The empty string means no. "maybe" means the container looks playable but the browser will not commit without the actual codecs string. "probably" means the browser is as confident as it is willing to be in advance. That three-valued answer is itself a small signal. Two browsers that both “support” a format can disagree on whether the answer is "maybe" or "probably" for the same query, and that disagreement is stable per build.
MediaSource.isTypeSupported is the one that matters for streaming. Media Source Extensions is how adaptive players (the YouTube, Netflix, and Twitch front ends of the world) feed segmented media into a <video> element from JavaScript, and isTypeSupported is the static method a player calls to decide which representations it can request. It drops the ternary and returns a plain boolean. The MIME string it takes is the same shape: a container plus a fully specified codec string, like video/mp4; codecs="avc1.640028" for H.264 High profile level 4.0, or video/webm; codecs="vp09.00.10.08" for a specific VP9 profile. Because real players probe a long list of these at startup, the natural collector simply replays the same list and hashes the vector of booleans.
The modern call is MediaCapabilities.decodingInfo. Where the older two answer only “can it play,” decodingInfo returns a promise that resolves to an object with three booleans: supported, smooth, and powerEfficient. The W3C Working Draft, edited by Apple’s Jean-Yves Avenard and Google’s Mark Foltz, defines the input as a full VideoConfiguration or AudioConfiguration with fields for contentType, width, height, bitrate, framerate, and a set of HDR-related fields: hdrMetadataType, colorGamut, and transferFunction. The audio side carries channels, samplerate, and spatialRendering. The two extra booleans are the new entropy. supported tells you the codec exists; smooth and powerEfficient tell you whether this specific device decodes that specific resolution in hardware. A laptop with a fixed-function AV1 block answers powerEfficient: true for 4K AV1. A device that would fall back to a software decoder answers false. That single bit separates hardware generations.
The fourth API is the one that reaches deepest, and it gets its own section below. Navigator.requestMediaKeySystemAccess is the entry point to Encrypted Media Extensions. It does not ask about a codec. It asks whether a named Content Decryption Module, with a named robustness level, is present and willing. The answer pulls in the OS, the browser vendor, and the hardware security path all at once.
Why the answers differ
A codec query feels like it should be a property of the browser. Chrome supports VP9; Safari supports HEVC; surely the answer is fixed once you know which browser is asking. It is not, and the reason is that the browser does not decode video by itself. It asks the platform.
When Firefox evaluates whether it can play a type, the call descends through its own decoder dispatch into a platform decoder module factory, and that factory enumerates the actual decoders the host offers. On the Camoufox issue tracker, the maintainers traced this exact path: MP4Decoder::IsSupportedType() and MatroskaDecoder::IsSupportedType() call PDMFactory::Supports(), which enumerates the system decoders present on the machine. The consequence they flagged is the one that matters for anyone trying to disguise a browser. A Linux server running FFmpeg 6.x that spoofs a macOS user-agent still reports a Linux-server codec matrix, because the codec answer comes from the host’s FFmpeg build, not from the user-agent string. Chrome’s bundled FFmpeg, Firefox’s, the system VideoToolbox on macOS, the Media Foundation decoders on Windows, the MediaCodec stack on Android: each of these has a different set of codecs, and the browser faithfully reports whichever one is underneath it.
That is the core insight. The codec surface is a property of the media framework, not the browser chrome the user-agent claims. HEVC is the cleanest example. Apple platforms ship HEVC decoding through VideoToolbox by default, so Safari and Chrome on a Mac both report it. On Windows and Linux, HEVC support is patchy, gated behind a paid OS codec pack or simply absent, so the same Chrome version answers differently depending on the OS underneath. AV1 splits along a different axis: it indicates a recent browser or a recent OS, and on decodingInfo the powerEfficient bit splits further between devices that have a hardware AV1 block and devices that fall back to dav1d in software. FLAC, Opus, the various AAC profiles, the alphabet soup of VP9 and AV1 profile-level-tier strings, each adds another column that varies across the population.
This is also why the surface is hard to fake without breaking playback. If a stealth browser claims support for a codec the host cannot actually decode, the next step (a real MediaSource append, a real <video> load) fails, and the failure is itself observable. The Camoufox maintainers, who patch Firefox at the source level to control its fingerprint, discussed exactly this trade-off: a proposed media:spoof_codecs mode would short-circuit the support check to return true for any valid codec string without consulting the platform factory, at the cost of then claiming support for codecs that may not decode. Detect-and-spoof here is a corner you can paint yourself into.
There is a quieter signal underneath the support booleans. Browsers occasionally accept malformed or mismatched codec strings in ways that differ by engine. Firefox bug 1758589 records a case where MediaCapabilities accepted VP8 codec strings that were validated against VP9 parameter rules, an internal parsing quirk that produces an engine-specific answer for an input no real player would send. A collector that probes a few deliberately odd strings can read the parser’s personality, not just the codec list, and the parser is a property of the engine and its version. The 797-MIME-type probe used by one academic demonstration is large precisely because the long tail of edge cases carries as much distinguishing power as the common formats.
The DRM probe goes deeper
Codec support describes the decode path. DRM support describes something more specific: which Content Decryption Module is installed, which OS it runs on, and how good the hardware security around it is. requestMediaKeySystemAccess takes a key-system string and a list of candidate configurations, and resolves only if the platform can satisfy one of them.
The key-system strings are themselves a near-perfect OS classifier. com.widevine.alpha is Google’s Widevine, present in Chrome, Firefox, and Chromium-based Edge. com.apple.fps, and the versioned com.apple.fps.1_0, is Apple FairPlay, present only in Safari. PlayReady, com.microsoft.playready and its hardware variant, belongs to the Microsoft stack. A device that accepts FairPlay and rejects Widevine is a Safari device; one that accepts Widevine and PlayReady hardware is Windows; one that accepts only Widevine is most likely Linux or Android. Before a single codec is named, the set of CDMs that respond already partitions the population by platform.
The configuration object sharpens it further. Each candidate carries initDataTypes, audioCapabilities, videoCapabilities, a distinctiveIdentifier requirement, a persistentState requirement, and, inside the capability entries, a robustness string. For Widevine that robustness string runs from SW_SECURE_CRYPTO at the bottom, through SW_SECURE_DECODE, up to HW_SECURE_CRYPTO, HW_SECURE_DECODE, and HW_SECURE_ALL at the top. Whether a device can satisfy HW_SECURE_ALL answers whether it has a hardware-backed media path, a trusted execution environment doing the decode behind the OS. That bit separates a cheap Android tablet from a flagship phone, and a generic desktop from one with a working hardware DRM path. The EME specification, in its privacy section, names these exact fields (initDataTypes, audioCapabilities, videoCapabilities, distinctiveIdentifier, persistentState, robustness) as things that “vary by implementation” and can distinguish users when combined across origins.
This is not theoretical. On 8 July 2020, the iter.ca writeup documented Reddit’s site running a script that enumerated four DRM systems (Widevine, PlayReady, ClearKey, and Adobe Primetime) as part of a broader fingerprint that also read installed extensions and floating-point math behavior. The script was attributed to White Ops, the bot-mitigation firm that has since rebranded to HUMAN. Merely probing for DRM availability was enough to trigger Firefox’s DRM permission prompt for some users, which is how the behavior got noticed in the first place. The point of interest is that the probe checked availability without ever playing protected content. Capability detection alone was the signal.
The Widevine case gets worse than capability detection. A 2023 paper at the Privacy Enhancing Technologies Symposium, “Your DRM Can Watch You Too” by Gwendal Patat, Mohamed Sabt, and Pierre-Alain Fouque of IRISA in Rennes, showed that browsers diverge in how strictly they follow EME’s privacy guidelines, and that several of them “gladly give away the identifying Widevine Client ID with no or little explicit consent.” The Client ID is not a capability bit. It is device-specific cryptographic material that the CDM includes in its license request. A companion proof-of-concept catalogues the fields that ride along in that request: Application Name, Company Name, Model Name, Architecture Name, Device Name, Product Name, Build Info, the Widevine CDM Version, OEM Crypto Build Info, OEM Crypto SPL, and a device certificate serial number. On Android in particular, that material is stable and unique enough to function as a hardware-backed super-cookie, and persistent EME sessions can be opened on the OS file system to carry state across visits. This is the far end of the surface, where “what can you decode” turns into “who exactly are you,” and it is the reason the EME spec insists that implementations must never expose a Distinctive Permanent Identifier to applications, even encrypted. The paper’s finding is that some implementations did anyway.
How much does it narrow a device
The honest answer is moderate on its own, large in combination, and unusually stable. Public estimates put the codec-support vector at roughly four to eight bits of entropy. That is less than canvas or WebGL, which can reach into the high teens of bits, and it is enough on its own to sort the population into a few thousand buckets rather than to single anyone out.
Public estimates put codec support at this range alongside the other hardware tells, the same band as hardware concurrency and device memory, and well below the WebGL vector that dominates a detector’s budget.
What makes those bits valuable is not their count but their character. Three properties matter. First, stability. Codec and DRM support does not change unless the OS, browser, or hardware changes, so the vector is constant across sessions, across the cache being cleared, across a switch from one IP to another. A fingerprinting system spends its entropy budget on signals that stay put, and this one barely moves. Second, independence. The codec matrix correlates with the GPU, the OS, and the browser build, but it is not redundant with the user-agent string, because it is read from a different layer (the media framework) and disagrees with a spoofed UA in exactly the way that exposes the spoof. Third, the hardware bits. The powerEfficient flag and the Widevine HW_SECURE_ALL answer reach past the software stack into the silicon, splitting device models that are otherwise indistinguishable in JavaScript. A commercial detector folds the codec vector into a larger feature set, where its job is less to identify than to cross-check: if the UA claims Safari on macOS but the codec matrix is a Linux FFmpeg build and FairPlay is absent, the inconsistency itself is the verdict. That cross-check logic is the same one detectors run on the navigator object, and the way a few weakly correlated signals combine into a stable identity is the subject of the entropy budget every detector balances.
The MediaCapabilities editors were aware of the trade. The spec’s privacy section argues that most of what decodingInfo exposes “can already be discovered via experimentation,” so the marginal entropy of the API over the older canPlayType is small. It also concedes the sharper edge: adding colorGamut, transferFunction, and hdrMetadataType “has the potential to add significant entropy,” because HDR capability is far more variable across devices than baseline codec support, and the same caution applies to audio channels and the HDCP and speaker-configuration details that live nearby. The mitigation the spec floats is for user agents to “fake a given set of capabilities” rather than always answering yes or no, to consider limiting the APIs to top-level browsing contexts, and to throttle calls under a privacy budget. Those are suggestions to implementers, not requirements, and most browsers have not taken them up.
What this looks like to a collector
A collector does not call these APIs once. It calls them dozens to hundreds of times, walking a fixed list of codec strings and key-system configurations, and reduces the answers to a hash. The structure is identical to the canPlayType demonstration that hashes 797 MIME-type results into one value: the individual answers are not interesting, the vector is. In pseudocode the shape is unremarkable.
probes = [ "video/mp4; codecs=\"avc1.640028\"", # H.264 High 4.0 "video/webm; codecs=\"vp09.00.10.08\"", # VP9 profile 0 "video/mp4; codecs=\"hev1.1.6.L93.B0\"", # HEVC "video/mp4; codecs=\"av01.0.05M.08\"", # AV1 "audio/mp4; codecs=\"mp4a.40.2\"", # AAC-LC "audio/ogg; codecs=\"opus\"", # Opus "audio/flac", # FLAC ... # plus a long edge-case tail]
vector = []for type in probes: vector.append(audio_or_video.canPlayType(type)) # "" | "maybe" | "probably" vector.append(MediaSource.isTypeSupported(type)) # bool info = await MediaCapabilities.decodingInfo({...}) # supported, smooth, powerEfficient vector.append([info.supported, info.smooth, info.powerEfficient])
for keysystem in ["com.widevine.alpha", "com.apple.fps", "com.microsoft.playready"]: vector.append(probe_key_system(keysystem, robustness_ladder)) # which levels resolve
device_media_id = hash(vector)The decodingInfo calls return promises, so a real collector fires them in parallel and awaits the batch, but the principle holds: a deterministic list in, a stable hash out. The same device produces the same hash on every visit, and a different device with a different media stack produces a different one. There is nothing exotic in the code. The signal lives entirely in the platform’s honest answers.
The reason this works as cleanly as it does is that the APIs were built to be honest. A player genuinely needs to know whether to send AV1 or H.264, and decodingInfo was designed to give it a truthful, detailed answer including the hardware-acceleration bits. The same truthfulness that lets YouTube pick the right stream lets a collector read the device. You cannot make the API lie to the collector without also making it lie to the legitimate player, which is the bind every defense in this space runs into.
What the defenders did
Two browsers treat this as a problem worth code. Firefox, under privacy.resistFingerprinting, intercepts both the codec and the capability calls. Bug 1461454, fixed in Firefox 82 in late 2020, standardizes the MediaCapabilities answers: for any type the browser actually supports, RFP mode reports smooth: true and powerEfficient: false, regardless of the real hardware. The reasoning in the bug is exact. The browser reports accurately whether something can be played, because lying there breaks playback, but always reports that it plays smoothly and not power-efficiently, because the smooth and powerEfficient bits are the ones that leak the hardware video block. It even follows the same code path so the timing matches a generic machine, closing a side channel that a naive early-return would have opened. The companion work clamps canPlayType and the codec list to a fixed allowlist so the supported-codec vector collapses to a single common value across all RFP users.
The Tor Browser inherits this and goes further by shipping a uniform platform profile, so that the whole population of Tor users presents one codec matrix rather than a per-machine one. The strategy in both cases is the same: not to refuse the question but to give every protected user the same answer, shrinking the anonymity set’s diversity to zero on this axis. It costs something. A device that cannot actually decode a clamped-in codec will fail a real playback attempt, and a site that wanted hardware-accelerated 4K gets told the device is software-only. For the threat model these browsers serve, that cost is acceptable.
Chrome has not done the equivalent. Its position, consistent with the MediaCapabilities spec’s own reasoning, is that the marginal entropy over already-discoverable information is small and the cost to legitimate streaming is real, so the API answers truthfully. The Privacy Budget proposal that would have throttled high-entropy surfaces stalled, and there is no general clamp on codec answers in shipping Chrome. The result is that the codec and DRM surface remains fully readable in the most common browser, and the defense, where it exists, lives in the privacy-focused minority.
The EME side has a partial structural defense that does not depend on the browser opting in. The spec requires per-origin isolation of any identifier the CDM exposes, requires that those identifiers clear with browsing data, and forbids exposing a permanent device identifier to the page. When implemented correctly, that turns the Widevine Client ID from a global super-cookie into a per-origin one, which is a real reduction in cross-site tracking power. The 2023 IRISA paper’s contribution was to show which browsers implemented it correctly and which did not, and the gap between the two is exactly the gap between a contained identifier and a leaked one.
Closing
The codec and DRM surface is a fingerprint that the web platform cannot easily give up, because the thing that makes it a fingerprint is the thing that makes it useful. A streaming site has to ask the device what it can decode, the device has to answer truthfully or waste bandwidth, and that truthful answer describes the hardware video block, the OS media framework, the browser build, and at the DRM layer sometimes the exact device. Four bits to eight bits on its own, but stable across cache clears and IP changes, read from a layer the user-agent string cannot fake without contradicting itself, and reaching at the high end into a hardware-backed identifier that the EME spec went out of its way to forbid and that some browsers leaked anyway.
The part worth holding onto is the asymmetry between the two ends of the surface. At the codec end, the defense is tractable: clamp the answers, accept that hardware-accelerated playback degrades for protected users, and shrink the anonymity set to one shared value, which is what Firefox and Tor do. At the DRM end, where the Client ID lives, there is no clamp that helps, because the identifier is not a capability bit you can standardize away but device-specific cryptographic material the module emits to do its job. The fix there is structural and silent: per-origin isolation done correctly, which the user never sees and cannot verify. A 2023 measurement found that several browsers did not do it correctly. That is the uncomfortable shape of this vector. The easy half is defended in the browsers almost nobody runs, and the hard half depends on an implementation detail the user has no way to check.
Sources & further reading
- W3C (2026), Media Capabilities, W3C Working Draft 9 June 2026 — defines decodingInfo, the VideoConfiguration/AudioConfiguration fields, and the security and privacy considerations including the HDR-entropy caveat.
- W3C (2026), Encrypted Media Extensions, W3C Working Draft 9 June 2026 — the EME spec and its privacy section naming initDataTypes, robustness, distinctiveIdentifier, and the per-origin isolation requirements.
- MDN (2025), HTMLMediaElement: canPlayType() method — the “probably”/“maybe”/"" ternary and its meaning.
- MDN (2025), MediaSource: isTypeSupported() static method — the boolean codec probe used by adaptive-streaming players.
- Patat, Sabt, Fouque (2023), Your DRM Can Watch You Too: Exploring the Privacy Implications of Browsers’ (mis)Implementations of Widevine EME — PETS 2023; shows browsers leaking the Widevine Client ID and catalogues the identifying fields.
- Avalonswanderer (2023), widevine_eme_fingerprinting — proof-of-concept enumerating the Widevine Client ID fields and persistent-session tracking.
- iter.ca (2020), Reddit’s website uses DRM for fingerprinting — documents the White Ops/HUMAN script probing Widevine, PlayReady, ClearKey, and Adobe Primetime.
- daijro / Camoufox (2024), canPlayType() and isTypeSupported() leak system codec libraries (issue 558) — traces the PDMFactory::Supports() path and the spoof-vs-playback trade-off.
- Mozilla (2020), Bug 1461454: Support Resist Fingerprinting in canPlayType and Media Capabilities APIs — the Firefox 82 fix that spoofs smooth=true, powerEfficient=false under RFP.
- Mozilla (2022), Bug 1758589: MediaCapabilities accepts VP8 codec strings based on valid VP9 parameters — an engine-specific codec-string parsing quirk.
- LRZ Privacy Check, Fingerprinting canPlayType — a live demonstration hashing 797 MIME-type support results into a single fingerprint.
- Scrapfly, Media Codec & MIME Type Fingerprint Test — interactive tester covering HTMLMediaElement, MediaSource, MediaRecorder, MediaCapabilities, and WebRTC codec surfaces.
Further reading
Canvas fingerprinting: how a single toDataURL call identifies a device
Traces how rendering text and shapes to an HTML5 canvas and hashing the toDataURL output yields a stable per-device value, the GPU, driver and font causes behind the variation, the 2012 origin, and how much entropy it really carries.
·22 min readWebGL fingerprinting: the renderer string, precision, and shader quirks
A primary-source reference on WebGL fingerprinting: the UNMASKED_RENDERER and UNMASKED_VENDOR strings, supported extensions, shader precision formats, rendered-image hashing, and the browser mitigations that bucket or hide them.
·24 min readAudioContext fingerprinting: the OscillatorNode signature explained
Traces how rendering an oscillator through OfflineAudioContext and a DynamicsCompressor produces a stable per-device float, the floating-point and FFT causes behind the variation, the 2016 origin, and how much entropy it really carries.
·18 min read