Skip to content

How Cloudflare uses TLS and HTTP/2 fingerprints in bot scoring

· 23 min read
Copyright: MIT
JA4 wordmark over a sample TLS fingerprint string, with the digit 4 in orange

Before a request to a Cloudflare-protected site sends a single byte of its User-Agent header, it has already told Cloudflare a great deal about itself. The TCP SYN carries options in a particular order. The TLS ClientHello lists cipher suites and extensions in a particular shape. The first HTTP/2 frames after the connection opens carry SETTINGS values and a pseudo-header order that the application never picked. None of this is chosen by the person writing the scraper. It is chosen by the network stack their code links against, and it is hard to change without rewriting that stack. Cloudflare reads all of it at the edge, turns it into a few compact strings, and asks one question: does the client that connected look like the browser it claims to be?

That question sits underneath the 1-99 bot score, and it is cheap to ask. The data arrives for free during connection setup, before any JavaScript runs and before any cookie exists. This post is about the network-fingerprinting layer specifically. How Cloudflare computes JA3 and JA4 from the ClientHello, what the HTTP/2 fingerprint captures, how those values surface in the rules language as cf.bot_management fields, how they feed the machine-learning score and the heuristic engine, and where the whole approach started to bend in 2023 when Chrome began randomizing the very bytes JA3 depended on.

The road map. First, where the fingerprint gets computed and why placement matters. Then JA3, the original TLS fingerprint, and the construction that made it popular and then fragile. Then JA4, FoxIO’s 2023 redesign, and the specific change that fixed what Chrome broke. Then the HTTP/2 fingerprint, frame by frame, using the Akamai format the industry standardized on. Then the part that actually decides the score: how these signals reach the bot-management engines and the cross-layer consistency check. And finally, JA4 Signals, the aggregate layer Cloudflare added in 2024, and what the current state looks like.

Where the fingerprint gets computed

Cloudflare terminates TLS at its edge. That single fact is what makes network fingerprinting possible as a signal. The edge server is the first machine to see the raw ClientHello, the TCP options, and the opening HTTP/2 frames, and it sees them on every connection regardless of what the client does afterward. A client that runs no JavaScript, accepts no cookies, and ignores every challenge still hands over a complete network fingerprint just by completing the handshake. For the large share of automated traffic that comes from raw HTTP libraries rather than headless browsers, this is often the only layer that produces a usable signal at all, and it produces one on the very first request.

The fingerprints are computed during the handshake and then attached to the request as metadata. Cloudflare’s documentation is explicit about the consequence: because JA3 and JA4 are calculated during the TLS handshake, “they will not be present for non-encrypted HTTP traffic.” They are also absent in a few other cases worth knowing, including when a Worker makes a subrequest to a third-party origin and when TLS session resumption skips the full handshake. The fingerprint is a property of the handshake, not of the request, so anything that bypasses or reuses the handshake changes what is available.

JA3 and JA4 are gated. Cloudflare states plainly that “JA3 and JA4 fingerprints are only available to Enterprise customers who have purchased Bot Management.” That gating matters for how you read the rest of this post. The fingerprints are always computed for scoring purposes on plans that run the bot-management engines, but they are only exposed as queryable fields, the ones you can write a rule against, on the Enterprise Bot Management tier.

It helps to picture the request path. A connection lands at the nearest Cloudflare data center, where TLS terminates. The edge server reads the ClientHello and computes the JA3 and JA4 strings before it has decrypted a single byte of the HTTP request, because the ClientHello is sent in the clear at the very start of the handshake. Once the handshake completes and the client sends its HTTP/2 preface, the edge reads the opening control frames and derives the HTTP/2 fingerprint. By the time the first request line is parsed, three layers of identity already exist as attached metadata: the TCP-level shape of the connection, the TLS fingerprints, and the HTTP/2 frame profile. Bot scoring runs against the request with all of that in hand, and a Worker or a custom rule executing afterward can read the exposed fields directly. Nothing about this added a round trip; the fingerprints are a byproduct of bytes the client had to send anyway.

JA3: the original TLS fingerprint

JA3 was published by Salesforce engineers in 2017, named after their three initials. The idea is simple enough to implement in an afternoon. Take the ClientHello, pull out five fields in a fixed order, write their decimal values into one string with commas between fields and dashes between values, then MD5-hash the result into a 32-character hex string.

The five fields are the TLS version, the list of accepted cipher suites, the list of extensions, the supported elliptic curves (the named groups), and the elliptic-curve point formats. The exact concatenation order is TLSVersion,Ciphers,Extensions,EllipticCurves,EllipticCurvePointFormats, each inner list dash-delimited. One detail keeps the fingerprint stable across modern clients: GREASE values are stripped before hashing. GREASE, defined in RFC 8701, is a deliberate mechanism where browsers inject random reserved values into cipher and extension lists to keep middleboxes from ossifying on a fixed set. If you left those random values in, every connection would hash differently, so any correct JA3 implementation drops them.

Why does this work as a signal? Because the cipher list, the extension set, and the curve preferences are baked into the TLS library, and different libraries ship different defaults. OpenSSL, BoringSSL, the Go standard library, Rust’s rustls, Python’s ssl module, and the NSS library inside Firefox each produce a recognizably different ClientHello. A browser’s hash is shared by millions of installs of that exact build. A scripting library’s hash is shared by everyone using that library. So a request that announces Chrome/138 in its header but produces the JA3 of Go’s net/http has told you, before you read a single cookie, that the header is lying.

The classic example is the Chrome hash cd08e31494f9531f560d64c695473da9, which for years was the single most common JA3 seen by large networks because it covered Chrome across most platforms. That stability was JA3’s strength and, as it turned out, the seed of its weakness.

ClientHello fields, in fixed order version ciphers extensions curves ec formats GREASE values stripped, joined with commas and dashes 771,4865-4866-...,0-23-65281-...,29-23-24,0 MD5 -> cd08e31494f9531f560d64c695473da9 *JA3 concatenates five ClientHello fields in a fixed order and MD5-hashes them. The cipher and extension lists are kept in the order the client sent them, which is exactly what broke in 2023.*

What Chrome broke in 2023

JA3 preserves the order the client sent its extensions. That was fine while browsers sent extensions in a stable order. It stopped being fine on 20 January 2023, when Chrome 110 began permuting the order of TLS extensions in the ClientHello on every connection. The change shipped to make the ecosystem harder to ossify, since RFC 8446 already allows extensions in any order except pre_shared_key, which must come last. The side effect was immediate and visible. Fastly, watching its own network, reported that the share of Chrome clients arriving with the old stable Chrome JA3 dropped sharply right at that date, because the extension order, and therefore the hash, now changed connection to connection.

This did not kill TLS fingerprinting. It killed the assumption that one client equals one hash. Suddenly Chrome produced a large family of JA3 values, all equally valid, none stable. Any detection rule keyed on a single Chrome JA3 string started missing real Chrome traffic, and any allow-list of known-good JA3 hashes became a maintenance problem that grew with every browser release. The industry needed a fingerprint that captured the same library-level identity without depending on a field the browser had decided to shuffle.

JA4: the redesign that survived randomization

JA4 is the answer, published by John Althouse at FoxIO on 26 September 2023. Althouse was one of the original JA3 authors, so this is the same lineage learning from its own first attempt. JA4 is part of a wider suite, JA4+, that fingerprints other layers too (JA4S for the server hello, JA4H for HTTP, JA4T for TCP, and several more), but the piece Cloudflare exposes for TLS clients is plain JA4.

The format is deliberately readable, unlike JA3’s opaque MD5. A JA4 fingerprint has three sections joined by underscores, written as a_b_c, and that structure is the point: you can match on a alone, or b and c, or any combination, instead of being forced to match the whole hash. The example FoxIO gives is t13d1516h2_8daaf6152771_e5627efa2ab1.

The first section, a, is human-readable and packs six facts. It opens with the transport (t for TLS over TCP, q for QUIC, d for DTLS), then the two-digit TLS version (13 for 1.3, 12 for 1.2), then a single letter for SNI (d when the SNI extension is present, indicating a domain, i when it is absent), then a two-digit count of cipher suites, a two-digit count of extensions, and finally a two-character code derived from the first ALPN value. So t13d1516h2 reads as: TLS over TCP, version 1.3, SNI present, 15 ciphers, 16 extensions, ALPN h2. That h2 is the negotiated HTTP/2, which is why the ALPN value lives right inside the TLS fingerprint.

The second and third sections are the part that survives Chrome’s shuffling. Section b is a 12-character truncated SHA-256 of the cipher suite list, and section c is a 12-character truncated SHA-256 of the extension list plus the signature algorithms. The decisive change from JA3 is that JA4 sorts the cipher list and the extension list into hexadecimal order before hashing, rather than preserving the order the client sent them. GREASE values are still excluded, the SNI and ALPN extensions are pulled out of the c hash because they are already accounted for in a, and the signature algorithms are kept in their original order because that order carries real information. Sorting is the whole trick. Once you sort, a permuted extension order collapses back to one canonical value, and Chrome’s randomization no longer changes the fingerprint.

t13d1516h2_8daaf6152771_e5627efa2ab1 a: readable b: sorted ciphers c: sorted exts + sigalgs section a, byte by byte: t TLS over TCP 13 TLS 1.3 d SNI present (domain) 15 15 cipher suites 16 16 extensions h2 ALPN = HTTP/2 b and c sort their lists before hashing, so a permuted extension order collapses to one value *JA4's first section is readable on sight; the second and third sort their lists before hashing. Sorting is what makes the fingerprint immune to Chrome's per-connection extension shuffle.*

Cloudflare exposes JA4 through the field cf.bot_management.ja4 and still keeps the older cf.bot_management.ja3_hash for clients and rules that depend on it. Both are documented as helping to “profile specific SSL/TLS clients across different destination IPs, ports, and X509 certificates,” which is the polite way of saying they identify the client stack independent of where it connects. The deeper history of this evolution, from the original ClientHello bytes through to the JA4+ suite, is its own subject, covered in TLS fingerprinting: from ClientHello bytes to JA4.

The HTTP/2 fingerprint

TLS is the first thing a client reveals, but not the only thing. Once the encrypted connection is up and ALPN has negotiated h2, the client immediately sends its HTTP/2 connection preface and a few control frames, and the shape of those frames is its own fingerprint. The format the industry uses came out of Akamai research presented at Black Hat Europe 2017, built from more than ten million HTTP/2 connections observed across Akamai’s edge. Cloudflare reads the same kind of signal; the Akamai format is the lingua franca for describing it.

The fingerprint concatenates several frame-level properties. The SETTINGS frame comes first, written as id:value pairs in the order the client sent them. HTTP/2 defines six settings: SETTINGS_HEADER_TABLE_SIZE, SETTINGS_ENABLE_PUSH, SETTINGS_MAX_CONCURRENT_STREAMS, SETTINGS_INITIAL_WINDOW_SIZE, SETTINGS_MAX_FRAME_SIZE, and SETTINGS_MAX_HEADER_LIST_SIZE. A client need not send all of them, and the ones it sends, their values, and their order are all stack-specific. Next is the increment from any initial WINDOW_UPDATE frame, which is how the client opens the connection-level flow-control window. Then any PRIORITY frames, each encoded as stream id, exclusivity bit, dependent stream, and weight. And finally the pseudo-header order, the sequence in which :method, :authority, :scheme, and :path appear, written with the single-letter codes m, a, s, p.

A concrete example from a Firefox 52 build on Windows looks like this:

[1:65536;4:131072;5:16384]|12517377|3:0:201:0,5:0:101:0,...|m,p,a,s

Read it left to right. Firefox declares header table size 65536, initial window size 131072, max frame size 16384. It opens the connection window with a WINDOW_UPDATE increment of 12517377. It sends a small tree of PRIORITY frames, which is a very Firefox thing to do. And it orders its pseudo-headers m,p,a,s. Chrome’s profile differs at almost every position: different SETTINGS values, no PRIORITY tree of that shape, and the pseudo-header order m,a,s,p. The values themselves are not secret or clever. They are just defaults that the browser engineers picked once and never need to change, which is exactly what makes them a stable fingerprint.

The pseudo-header order is the single most quietly diagnostic field. RFC 9113 leaves the order of pseudo-headers up to the implementation, so each stack picks one and sticks with it. Chrome sends m,a,s,p. Firefox sends m,p,a,s. Most HTTP libraries send whatever their author happened to write, and a surprising number send an order that no shipping browser uses. A request whose User-Agent says Chrome but whose pseudo-headers arrive in Firefox order, or in an order belonging to no browser at all, has contradicted itself on the wire. DataDome reads the same field for the same reason, which we walk through in how DataDome uses HTTP/2 and network fingerprints; the mechanism is industry-standard, not vendor-specific.

HTTP/2 fingerprint, four segments joined by | [1:65536;4:131072; 5:16384] SETTINGS id:value 12517377 WINDOW_UPDATE 3:0:201:0,5:0:101:0 PRIORITY frames m,p,a,s pseudo-header order pseudo-header order alone separates the major stacks: m,a,s,p Chrome family m,p,a,s Firefox other orders that no shipping browser produces -> high-confidence automation *An HTTP/2 fingerprint in the Akamai format. The pseudo-header order on its own is enough to tell Chrome from Firefox, and to flag a client whose order belongs to no browser at all.*

Akamai’s own conclusion is worth holding onto, because it sets the right expectation. The HTTP/2 fingerprint by itself does not carry enough entropy to track an individual user. What it does is reveal the implementation: the vendor, often the operating system, sometimes the version. That is precisely what bot detection wants. The job is not to single out a person but to decide whether the client is the browser it claims to be, and for that, implementation-level identity is exactly enough.

There is a quieter property of the HTTP/2 fingerprint that makes it valuable alongside JA4: it is harder to spoof from a generic HTTP library than the ClientHello is. A growing number of scraping toolkits ship with TLS-mimicking layers that reproduce a real browser’s ClientHello byte for byte, because the ClientHello is constructed once at handshake time and the libraries to forge it are mature. The HTTP/2 frames are different. They are emitted by the HTTP/2 engine continuously as the connection runs: the SETTINGS the engine declares, the flow-control window it advertises, the priority tree it builds, the order in which its HPACK encoder serializes pseudo-headers. Matching all of that means matching the behaviour of the browser’s HTTP/2 state machine, not just a static blob it sent at connection open. That asymmetry is part of why Cloudflare reads both layers rather than treating TLS as sufficient.

When ALPN negotiates h3 instead of h2, the same logic moves down to QUIC. HTTP/3 runs over QUIC rather than TCP, so there is no TCP handshake to fingerprint, but QUIC carries its own transport parameters, its own initial packet structure, and its own version negotiation, and these are as implementation-specific as the HTTP/2 SETTINGS. Cloudflare folds HTTP/3 into the same picture, which is why the JA4 Signals layer tracks an h2h3_ratio_1h that combines HTTP/2 and HTTP/3 rather than treating them separately. The fingerprintable surface follows the protocol stack wherever it goes; switching from TCP to QUIC changes which fields exist, not whether the client leaks an implementation identity.

How the signals reach the score

Cloudflare runs several detection engines, and the network fingerprints feed more than one of them. The two that matter most here are the machine-learning engine and the heuristics engine.

The machine-learning engine produces the bot score directly. The score is “an integer between 1-99 that indicates Cloudflare’s level of certainty that a request comes from a bot,” and the polarity catches people out: a score of 1 means Cloudflare is quite certain the request was automated, while 99 means it is quite certain the request came from a human. The documentation groups the range into buckets, with 1 flagged as automated, 2 through 29 as likely automated, and 30 through 99 as likely human. The model takes a wide feature vector built from request properties, and the network fingerprints sit inside that vector as features. They are not a single override switch; they are inputs whose weight the model learned from labelled traffic. How the score is shaped, surfaced, and acted on is the larger subject of Cloudflare bot management scoring.

The heuristics engine works differently and more bluntly. It runs on all requests and matches them against a database of known-bad fingerprints, producing detection IDs rather than a probability. When a heuristic fires, the request gets a score of 1 and the matching rule shows up in cf.bot_management.detection_ids, the field that lists which heuristic detections fired. This is where a network fingerprint can act as a near-deterministic signal. If a JA4 or an HTTP/2 profile is known to belong only to a particular bot toolkit, a heuristic can match it and score the request automated without consulting the model at all. Cloudflare has said its analysts wrote 50 such heuristics in a recent push, “using a variety of signals, including but not limited to HTTP/2 fingerprints and Client Hello extensions.” That sentence names exactly the two layers this post is about.

The split between the two engines matters for how the fingerprints get used in practice. The machine-learning score is probabilistic and continuous, so it absorbs a fingerprint as one weighted feature among many; a slightly-off TLS profile nudges the score without dominating it, which keeps false positives down when a legitimate but unusual client appears. The heuristics are categorical and they win outright when they match, which is why they are reserved for fingerprints with very low collision against real browser traffic. A toolkit that ships one unmistakable JA4 string is a clean heuristic target. A library whose fingerprint overlaps with some real client population is better left to the model, where it contributes evidence rather than a verdict. The two paths exist so that high-confidence fingerprints can be acted on immediately while ambiguous ones still get a vote. An operator reading cf.bot_management.detection_ids on a request is reading which of those high-confidence rules fired, and a network-fingerprint detection ID is among the cheapest a request can trip, because it needed no challenge and no script execution to produce.

The cross-layer consistency check

The strongest use of these fingerprints is not any one of them alone. It is the comparison between them. A real Chrome browser emits a Chrome TLS fingerprint, Chrome HTTP/2 SETTINGS, a Chrome pseudo-header order, and a Chrome JavaScript surface, all at once, because they all come from one binary. They are consistent by construction; you cannot ship the Chrome TLS stack without also shipping the Chrome HTTP/2 stack. An automated client assembles its identity from parts. It might borrow a real Chrome TLS fingerprint by linking a TLS-mimicking library, then speak HTTP/2 through a different library with a different SETTINGS profile, then claim Chrome in its User-Agent, then expose a headless JavaScript surface, or no JavaScript at all.

Each of those layers might individually pass. The TLS fingerprint matches a real Chrome. The header looks right. But the combination has never existed. No shipping Chrome produces that TLS fingerprint with that HTTP/2 profile with that pseudo-header order. The mismatch across layers is the signal, and it is much harder to defeat than any single fingerprint, because fixing it means making every layer agree simultaneously, which in practice means actually being the browser rather than impersonating it. This is the same logic Cloudflare applies on the client side with Turnstile and across challenge types; the network layer just provides the cheapest version of the consistency check, the one that runs before any challenge.

real browser assembled client TLS: Chrome HTTP/2: Chrome User-Agent: Chrome JS surface: Chrome all layers agree -> consistent TLS: Chrome (mimicked) HTTP/2: Go net/http User-Agent: Chrome JS surface: headless / none layers contradict -> flagged *A real browser's layers are consistent because they share one binary. An assembled client borrows each layer from a different source, and the contradictions are what the cross-layer check reads.*

JA4 Signals: the aggregate layer

A single fingerprint tells you what a client is. It does not tell you how that client has been behaving across the rest of the network in the last hour. Cloudflare closed that gap in August 2024 with JA4 Signals, an aggregate layer computed per JA4 fingerprint over a rolling window. The scale it draws on is the point: Cloudflare says it analyzes “over 15 million unique JA4 fingerprints generated from more than 500 million user agents and billions of IP addresses” on a daily basis, and the signals themselves are “inter-request features computed based on the last hour of all traffic that Cloudflare sees globally.”

What that buys you is context. A JA4 fingerprint is no longer just a static string; it carries a profile of recent behaviour. The documented signals include browser_ratio_1h, the share of requests for that fingerprint that came from browser-like user agents in the last hour, and h2h3_ratio_1h, the share that used HTTP/2 or HTTP/3 rather than HTTP/1.1. There are ranking and quantile signals too, such as reqs_quantile_1h, which places a fingerprint among all fingerprints by request volume. The intuition is straightforward. A fingerprint that claims to be a browser but shows a low browser_ratio_1h across the global network, or that suddenly spikes into a high request quantile from a narrow set of IPs, is behaving unlike the browser it resembles, and that behavioural mismatch becomes a feature the same way the cross-layer mismatch does.

These signals surface in the rules language and in Bot Analytics, so an Enterprise customer can write rules against them directly, not just consult the bot score they helped produce. It is the network fingerprint graduating from an identity into a small reputation, computed continuously across the whole of Cloudflare’s traffic and refreshed every hour. The cross-customer telemetry idea here echoes what other vendors built, for instance HUMAN’s collective signal network; the shared lesson is that a fingerprint is far more useful when you can see how it behaved everywhere else first.

Where this leaves things in 2026

The arc of this layer is a slow tightening. JA3 was a clean idea that worked until the thing it measured, extension order, turned out to be something Chrome was willing to change. JA4 fixed that by sorting, and bought back the stability, at the cost of being one more thing to compute and store. The HTTP/2 fingerprint added a second wire-level identity that has to agree with the first. JA4 Signals added a behavioural reputation on top of the static identity. Each step did not replace the previous one. It added a layer that an automated client now also has to satisfy, and the layers reinforce each other precisely because they are hard to fake all at once.

The honest limit is worth stating. None of this is a verdict on its own. Cloudflare’s own documentation places JA3 and JA4 inside the feature vector of a machine-learning model and inside a database of heuristics, not at the top of a decision tree. A network fingerprint can be borrowed; TLS-mimicking libraries exist and are good. The defensive value is not that any single fingerprint is unforgeable. It is that forging one in isolation is easy and forging all of them in agreement, the TLS stack and the HTTP/2 stack and the pseudo-header order and the JavaScript surface and the hourly behaviour of that fingerprint across the network, is close to the work of actually running the browser. The exact weights Cloudflare assigns to each of these inside its scoring model are not public, and this post has stayed on the side of what the vendor documents and what the published fingerprint formats specify rather than guessing at the internal layout. What is public is enough to see the shape of the thing: the cheapest, earliest signal Cloudflare has is the one the client gives away for free before it says a word.


Sources & further reading

Further reading