Skip to content

How DataDome uses HTTP/2 and network fingerprints as a signal

· 21 min read
Copyright: MIT
HTTP/2 wordmark over a sample Akamai-format fingerprint string, with the digit 2 in orange

A request carries two kinds of identity. There is the one it announces, in the User-Agent header and the client hints, where it can claim to be Chrome 138 on Windows. And there is the one it leaks, in the shape of the bytes it puts on the wire before any header is parsed. The TLS ClientHello, the TCP options on the SYN, the first HTTP/2 frames a client sends after the connection opens. These are not chosen by the script author. They are chosen by the network stack the script links against, and they are extremely hard to change without rewriting that stack. DataDome, like every serious bot-detection vendor in 2026, reads both kinds of identity and asks a single question: do they agree?

That question is the whole game at the network layer. A real Chrome browser produces a Chrome TLS fingerprint, Chrome HTTP/2 SETTINGS, and a Chrome pseudo-header order, all at once, because they all come from the same binary. A scraper that sets User-Agent: ...Chrome/138... but speaks HTTP/2 through Python’s httpx or Go’s net/http produces a fingerprint that has never belonged to any Chrome that ever shipped. The header says one thing, the frames say another, and the gap between them is a high-confidence automation signal that costs the server nothing to compute. This post is about that gap specifically, on the HTTP/2 and TLS/JA layer, and how DataDome turns it into a score.

The road map. First, why the network layer is worth reading at all, and where DataDome sits in the request path. Then the HTTP/2 fingerprint itself, frame by frame, using the Akamai format that the whole industry standardized on. Then the TLS side, JA3 through JA4, and why extension randomization changed the rules in 2023. Then the part that matters most, the cross-layer consistency check, where the claimed user agent is held up against everything the wire revealed. A short tour of the TCP/IP layer underneath. And finally, what has changed by 2026 and what that means for anyone trying to read or write these fingerprints.

Where the network layer sits

DataDome runs as a module at the edge, in front of the application. The integration is a small piece of code in your CDN, reverse proxy, or server (there are modules for Nginx, Cloudflare Workers, Fastly, and a long list of others), plus a JavaScript tag on the page. The module forwards a description of each request to DataDome’s API, which returns a decision in single-digit milliseconds: allow, challenge, or block. The detailed mechanics of that round trip are covered in DataDome’s server-side scoring pipeline; what matters here is what the module can see.

It can see the headers, obviously. But because it sits at TLS termination, it can also see the raw ClientHello and, with the right deployment, the TCP-level properties of the connection and the HTTP/2 frames. DataDome’s own integration docs ask the module to collect and forward TLS JA3 and JA4 fingerprints from incoming requests, alongside the full set of HTTP request headers, which tells you these signals are first-class inputs to the model and not an afterthought. The network fingerprint is computed where the connection lands and travels to the scoring API as part of the request description.

This placement is the reason network fingerprinting works as a signal at all. The data is free. The client already sent a ClientHello and an HTTP/2 SETTINGS frame to establish the connection; reading them adds no round trips and no JavaScript. A client that has disabled JavaScript, or never executes the DataDome JS tag, still hands over a complete network fingerprint just by connecting. For the large share of abusive traffic that comes from raw HTTP clients rather than headless browsers, the network layer is often the only layer that produces a signal, and it produces one on the very first request before any cookie has been issued.

client stack TCP SYN TTL, window, options TLS ClientHello JA3 / JA4 HTTP/2 frames SETTINGS, WINDOW_UPDATE edge module reads all three, forwards to scoring app *The three network-layer fingerprints are emitted during connection setup, before the application sees anything. The edge module reads them for free and ships them to the scoring API alongside the headers.*

The HTTP/2 fingerprint, frame by frame

HTTP/2 is where the network layer gets its richest signal, and the reason is historical. HTTP/1.1 was a line-oriented text protocol; a request was a few lines of ASCII, and almost everything about it could be forged byte for byte. HTTP/2 is binary and stateful. A connection opens with a fixed preface, then both sides exchange a SETTINGS frame describing their parameters, and the client begins sending HEADERS frames, WINDOW_UPDATE frames for flow control, and historically PRIORITY frames. Each of those is emitted by the client’s HTTP/2 implementation according to choices baked into that implementation. Chrome’s network stack, Firefox’s, Safari’s, Go’s net/http, Python’s h2, nghttp2 behind curl: every one of them makes slightly different choices, and those choices are stable across requests.

This was first written up in detail by two Akamai researchers, Elad Shuster and Ory Segal, in a 2017 Black Hat Europe paper titled Passive Fingerprinting of HTTP/2 Clients. They extracted fingerprints from more than ten million HTTP/2 connections covering over 40,000 distinct user agents, and they proposed a compact text format that the rest of the industry adopted more or less wholesale. That format is the lingua franca of HTTP/2 fingerprinting in 2026, and DataDome’s network-layer signal speaks it whether or not the company uses that exact serialization internally.

The format concatenates four pieces, separated by pipes:

the four fields SETTINGS id:value;id:value WINDOW_UPDATE increment or 0 PRIORITY frames or 0 PSEUDO ORDER m,a,s,p

a recent Chrome, roughly 1:65536;2:0;4:6291456;6:262144|15663105|0|m,a,s,p exact values drift across Chrome versions; the structure is the stable part The four fields of the Akamai HTTP/2 fingerprint. The example values are representative of recent Chrome; treat the structure as the durable signal and the literal numbers as version-dependent.

The SETTINGS frame

The first and most discriminating field is the SETTINGS frame. When an HTTP/2 client opens a connection it sends a list of parameters, each an ID-value pair, and the fingerprint records them in the order sent. RFC 9113, the current HTTP/2 specification, defines six settings with these identifiers: SETTINGS_HEADER_TABLE_SIZE (0x01), SETTINGS_ENABLE_PUSH (0x02), SETTINGS_MAX_CONCURRENT_STREAMS (0x03), SETTINGS_INITIAL_WINDOW_SIZE (0x04), SETTINGS_MAX_FRAME_SIZE (0x05), and SETTINGS_MAX_HEADER_LIST_SIZE (0x06).

A client does not have to send all six. It sends the ones it cares to override, in whatever order its implementation emits them, and leaves the rest at the protocol default. This is where the discrimination comes from. Chrome and Firefox both set HEADER_TABLE_SIZE to 65536 and disable push with ENABLE_PUSH: 0, but they part ways on flow control: Chrome advertises an INITIAL_WINDOW_SIZE of 6291456 (six megabytes), while Firefox uses 131072 and additionally pins MAX_FRAME_SIZE to 16384. Safari produces yet another profile, sending MAX_CONCURRENT_STREAMS and a much larger window. The point that researchers keep making is that the absent parameters carry as much information as the present ones. A client that sends MAX_CONCURRENT_STREAMS at all is already not behaving like recent Chrome, which dropped that setting from its frame.

Now hold a default HTTP client up against this. The defaults are wrong in ways that are obvious to anyone who has seen a real browser’s frame. Older curl through nghttp2 historically advertised a one-gigabyte window (INITIAL_WINDOW_SIZE near 1073741824), a value no browser would ever send. Go’s standard library has its own constant. Python’s h2 has another. None of these match any Chrome build, so a request that says User-Agent: ...Chrome... and sends a non-Chrome SETTINGS frame has already failed the consistency check on the first field of the fingerprint, before flow control or pseudo-headers even enter the picture.

Flow control and the window update

The second field captures flow control. After the SETTINGS exchange, browsers typically send a connection-level WINDOW_UPDATE on stream 0 to enlarge the receive window beyond the 65,535-octet default that RFC 9113 mandates. The increment is implementation-specific. Chrome’s value works out to a roughly 15-megabyte total window (the increment recorded as 15663105 in the Akamai format), and Firefox uses a different target that serializes to 12517377. A client that sends no WINDOW_UPDATE records a 0 here, which is itself a tell, because nearly every modern browser enlarges its window immediately.

Flow control is a quiet signal but a sticky one. It reflects buffer-sizing decisions that live deep in a network stack, the kind of thing nobody overriding a User-Agent header thinks to also patch. To match Chrome’s window behavior you have to reproduce both the SETTINGS window and the connection-level update with the exact increments Chrome uses, and keep them consistent with whatever Chrome version your header claims to be.

Priority, and why it mostly went away

The third field is the most interesting in 2026 because it is largely a fossil. HTTP/2 as originally specified in RFC 7540 had an elaborate stream-prioritization scheme, a dependency tree with weights and exclusivity flags, and clients signaled it with PRIORITY frames. Akamai’s 2017 fingerprint captured those frames in detail, and they were wonderfully discriminating. Firefox in that era sent a fixed scaffold of five PRIORITY frames building a dependency tree (the classic sequence with stream IDs 3, 5, 7, 9 and 11 at specific weights), a pattern so distinctive you could spot Firefox from priority alone.

Then the scheme collapsed under its own complexity. RFC 9113, which replaced RFC 7540 in 2022, states plainly that it “deprecates the priority signaling defined in RFC 7540” and notes that “the prioritization signaling in RFC 7540 was not successful.” Browsers moved to the simpler scheme in RFC 9218, which carries priority as an HTTP header (priority:, with urgency and incremental hints) rather than as frames. So modern Chrome sends no separate PRIORITY frames and records a 0 in this field. Firefox kept sending its frame-based priorities for years longer, which means the presence or absence of PRIORITY frames still separates browser families. The field did not stop being a signal; it became a signal about which prioritization era a client was built in, and a client emitting RFC 7540 frame priorities while claiming to be a current Chrome is making a version-inconsistent claim.

Pseudo-header order

The fourth field is the one that survives every transcoding proxy and still nails the client: the order of the HTTP/2 pseudo-headers. Every HTTP/2 request begins with four pseudo-headers, :method, :scheme, :authority, and :path, and RFC 9113 requires only that all pseudo-header fields appear before any regular field. It does not mandate an order among the four. So each implementation picked one and stuck with it.

Chrome sends them method, authority, scheme, path, abbreviated m,a,s,p. Firefox sends method, path, authority, scheme: m,p,a,s. Safari sends method, scheme, path, authority: m,s,p,a. These orders are stable, they are baked into HPACK header encoding in the respective stacks, and they are nearly impossible to change from application code because most HTTP libraries serialize pseudo-headers internally in a fixed order you do not control. Default curl emits an order that matches no browser at all. A request whose pseudo-header order is m,p,s,a was not produced by Chrome, Firefox, or Safari, and if it claims to be one of them in the User-Agent, that is a contradiction the model can read with certainty.

There is a fifth signal that the Akamai format does not always serialize but that DataDome and its peers absolutely read: the order of the ordinary, non-pseudo headers. Browsers emit accept, accept-encoding, accept-language, sec-ch-ua, user-agent and the rest in a consistent sequence per version. HTTP libraries emit them in insertion order or alphabetically. Header order is the HTTP/1.1-era fingerprint that carried straight into HTTP/2, and it is captured by JA4H on the FoxIO side. Reproducing it means controlling exactly which headers you send, in exactly which case, in exactly which order, matched to the browser version in your User-Agent. Get any of that wrong and you have manufactured another inconsistency.

The TLS side: JA3, JA4, and what randomization broke

Before a single HTTP/2 frame is sent, the client has already revealed itself in the TLS ClientHello. This is the older and more famous half of network fingerprinting, and DataDome’s integration explicitly asks edge modules to capture JA3 and JA4 from it. The mechanics of the ClientHello are covered in depth in TLS fingerprinting: from ClientHello bytes to JA4; the short version is what matters here.

JA3, published by Salesforce engineers in 2017, builds a fingerprint by concatenating five fields from the ClientHello (the TLS version, the cipher suite list, the extension list, the supported elliptic curves, and the curve formats) into a string and hashing it with MD5. For years a JA3 hash was a reliable label: a given Chrome build on a given OS produced a stable hash, and Python’s requests over OpenSSL produced a different, equally stable one. Map the hash to a client type and a contradicting User-Agent lights up.

That broke on a schedule everyone in the field remembers. Starting with Chrome 110 in January 2023, and Firefox 114 soon after, browsers began randomizing the order of TLS extensions in the ClientHello on every connection, a deliberate anti-fingerprinting move borrowed from the GREASE philosophy. Because JA3 hashes the extension list in the order it appears, randomized order means a fresh JA3 hash on nearly every connection from the same browser. A signal that changes every request is not a signal. JA3 as a static label was effectively dead for current browsers overnight.

JA4 was John Althouse’s answer, published through FoxIO in September 2023 as part of the broader JA4+ suite. It fixes the randomization problem by sorting the cipher and extension lists before hashing, so a reordered ClientHello still yields the same fingerprint. The format is deliberately legible: three sections separated by underscores, a_b_c. The first section is human-readable metadata: transport (t for TCP, q for QUIC), TLS version, whether SNI is present, the cipher and extension counts, and the ALPN value, so t13d1516h2 reads as TCP, TLS 1.3, SNI present, 15 ciphers, 16 extensions, ALPN of h2. The second and third sections are truncated SHA-256 hashes of the sorted ciphers and the sorted extensions plus signature algorithms. The suite extends past TLS: JA4H fingerprints the HTTP client including header order, and JA4T fingerprints the TCP layer, which becomes relevant in a moment.

The practical effect for detection is that the network fingerprint is no longer a single brittle hash to be matched or evaded. It is a structured object with parts that can be queried independently. You can ask whether the TLS-1.3-and-h2 metadata in JA4’s first section is even consistent with a client that, two layers up, sends Firefox-shaped HTTP/2 SETTINGS. That decomposition is what makes cross-layer reasoning possible.

2017 JA3 published; Akamai H/2 paper 2022 RFC 9113 deprecates H/2 priority frames Jan 2023 Chrome 110 randomizes TLS extensions; JA3 fades Sep 2023 JA4+ released, sort-stable hashing

the brittle single-hash era ended in 2023; structured, decomposable fingerprints replaced it The shift that reshaped TLS fingerprinting. Extension randomization in early 2023 killed JA3 as a static label, and JA4’s sort-before-hash design plus its readable metadata section made cross-layer consistency checks the durable approach.

The consistency check is the actual signal

Here is the part that gets lost in every “list of fingerprints” explainer. No single network fingerprint is a verdict on its own. A JA4 hash is just a label; plenty of legitimate, rare clients carry unusual labels, and plenty of bots now carry the exact JA4 of a real Chrome because they route through a real Chrome or a faithfully impersonated TLS stack. The signal that holds up is not “this fingerprint is bad.” It is “these fingerprints, and the headers, do not describe the same client.”

DataDome’s public material is consistent on this point. The company describes its server-side detection as resting on HTTP fingerprints derived from headers such as the user agent and accepted compression algorithms, TCP fingerprints from packet-level differences, TLS fingerprints from supported cipher suites, and server-side behavioral features like request frequency and browsing patterns. The model is trained on what a coherent client looks like across all of those, and it scores the gaps. The detection model as a whole, and the full inventory of first-request signals, is laid out in DataDome’s detection model; the network layer contributes the cross-checks below.

Walk the contradictions a single request can contain. The User-Agent claims Chrome 138 on Windows. The TLS ClientHello produces a JA4 whose metadata and cipher hash belong to OpenSSL, not BoringSSL, the library Chrome actually uses. Two layers up, the HTTP/2 SETTINGS frame advertises a one-gigabyte initial window and a pseudo-header order of m,p,s,a, which is neither Chrome nor any browser. The header order is alphabetical, which no browser does. Each of these, alone, might be a weird-but-real client. Together they describe a Python or Go HTTP client wearing a Chrome costume, and the joint probability that a genuine Chrome 138 produced this combination is essentially zero.

claimed: User-Agent says Chrome 138 / Windows

layer genuine Chrome spoofing HTTP client

TLS / JA4 BoringSSL profile OpenSSL profile

H/2 SETTINGS window 6291456 window ~1 GB pseudo order m,a,s,p m,p,s,a header order Chrome sequence alphabetical TCP / TTL ~128 (Windows) ~64 (Linux host) *No single row convicts. The model scores the joint improbability that one genuine Chrome on Windows produced every value in the right-hand column at once.*

This is why the framing of “user-agent rarity” has taken over. The question a 2026 system asks is not “is this JA4 on a blocklist” but “does this JA4 normally claim to be Chrome 138?” A JA4 hash that has only ever been seen on Python clients, arriving with a Chrome user agent, is rare in a specific and suspicious way. The same logic runs across every pair of layers. If the TCP fingerprint says the host OS is Linux but the user agent and JA4 say iOS, that combination does not occur on a real iPhone, so it is a bot. The fingerprints do not need to be individually forbidden. They need to be mutually impossible.

There is a subtle asymmetry worth stating plainly. Matching one layer perfectly does not help if you miss another, because the model scores the worst inconsistency, not the best match. A scraper that nails the TLS JA4 of Chrome 138 through a precise impersonation library, then speaks HTTP/2 through a generic library that sends the wrong SETTINGS and the wrong pseudo-header order, has spent enormous effort on the TLS layer and given the whole thing away on the HTTP/2 layer. The fingerprints are joined at the version. Claiming Chrome 138 commits you to Chrome 138’s TLS profile and Chrome 138’s HTTP/2 frames and Chrome 138’s header order and a plausible OS underneath, all simultaneously and all consistent with each other. That is a high bar, and it is the bar by design.

The TCP/IP layer underneath

Below TLS sits the layer that the application can least control: the TCP/IP stack of the operating system. The first packet of any connection, the SYN, carries an initial TTL, a TCP window size, a maximum segment size, and a specific arrangement of TCP options. These are set by the OS kernel, not by the browser and certainly not by the script. Passive OS fingerprinting, the technique that p0f pioneered two decades ago and that modern tools like Zardaxt revived, reads those fields to infer the operating system without sending a single probe.

DataDome treats this as another input to the same consistency engine, and it published an analysis of Zardaxt that lays out the reasoning. The classic tell is the initial TTL. Windows hosts start TTL near 128; Linux hosts start near 64. A connection that arrives with a TTL consistent with Linux, carrying a User-Agent that claims Windows, has a problem, the same kind p0f historically flagged with its us_vs_os mismatch code: the OS inferred from the packets disagrees with the OS claimed in the headers. TCP options compound it. Windows and Linux differ in which options they include and in what order, with the timestamp option being a common discriminator. JA4T in the FoxIO suite formalizes this same TCP SYN fingerprint.

The honest caveat is that the TCP layer is the noisiest of the three. TTLs get rewritten by routers and decrement over hops, NAT and proxies muddy the window and MSS values, and a datacenter proxy fronting a residential exit changes the picture. So the TCP fingerprint is a weaker individual signal than TLS or HTTP/2, and DataDome weights it accordingly. But weak signals that are cheap and hard to forge are still worth collecting, especially when their job is not to convict alone but to add one more axis along which a spoofed client can contradict itself. A bot operator who perfectly reproduces Chrome’s TLS and HTTP/2 from a Linux box still ships Linux TCP options under a Windows user agent unless they have also patched the kernel.

Where this stands in 2026

Two things have changed the shape of network fingerprinting since the Akamai paper, and both cut against the simple version of the technique. The first is that the single-hash era is over. JA3 stopped being a usable static label when browsers randomized extensions in 2023, the original HTTP/2 priority frames that were so discriminating for Firefox were deprecated by RFC 9113 and are fading from real traffic, and the surviving fingerprints are structured objects you reason about in parts rather than opaque hashes you match. A blocklist of bad fingerprints would be nearly useless today. A model of which fingerprint combinations are mutually consistent is not.

The second change is that impersonation got good. Libraries that reproduce a target browser’s TLS ClientHello and HTTP/2 frame profile faithfully are widely available and widely used, which means a clean match on any single layer no longer tells you much. The detection edge moved entirely to consistency. It is comparatively easy to copy Chrome’s JA4. It is much harder to copy Chrome’s JA4 and Chrome’s exact HTTP/2 SETTINGS and Chrome’s pseudo-header order and Chrome’s ordinary header order and a coherent OS-level TCP fingerprint, all pinned to the same Chrome version, on every request, while also clearing the JavaScript and behavioral layers that come after. Every layer you forget is a contradiction, and DataDome’s model is built to find the one you forgot.

What that leaves, for anyone reading these systems rather than fighting them, is a clear hierarchy. The user agent is the least trustworthy thing a request carries, because it is the one field fully under the client’s control. The network fingerprints below it are trustworthy in inverse proportion to how easy they are to change, which is why the most valuable signal in the whole stack is not any fingerprint at all but the distance between what a client says it is and what its bytes reveal it to be. That distance is computed before the first byte of HTML is sent, it costs the server nothing, and a client that has it cannot talk its way out of it.


Sources & further reading

Further reading