Skip to content

ALPN and NPN in fingerprinting: how protocol negotiation leaks the client

· 19 min read
Copyright: MIT
The token h2,http/1.1 as a monospace wordmark with a single orange underline under h2 and a small grey ALPN subtitle

There is a short list of bytes inside every HTTPS ClientHello that announces, before any HTTP is spoken, what the client intends to speak. For a modern browser it reads h2,http/1.1. For an old curl it might read http/1.1 alone, or nothing at all. The list looks innocuous. It is two or three strings naming protocols the client supports. But it sits in the clear, it is ordered by preference, and the choices and their order are stable enough per client that a detection system can read them as a label. Then the same system watches what the connection actually does after the handshake closes, and checks whether the protocol the client negotiated matches the protocol the client speaks.

That second check is the interesting one. The negotiation layer does not just describe the client. It makes a promise that the rest of the stack has to keep. A request that negotiated h2 and then sends an HTTP/1.1 request line has contradicted itself across two layers of the same connection, and self-contradiction is one of the cleanest signals a bot detector can ask for. This post is about that list of bytes: where it came from, how it is encoded on the wire, how fingerprints read it, and why the agreement between the offered protocols and the HTTP that follows is harder to fake than it looks.

The sections below run roughly in order. First the extension itself, ALPN, and the wire format that RFC 7301 nailed down. Then NPN, the predecessor Google shipped for SPDY, and why it worked the other way around and got dropped. Then how JA3 and JA4 fold the protocol list into a hash, and what each captures. Then the cross-layer story: ALPN as a promise the HTTP/2 layer has to honor, and the mismatches that give clients away. A closing section on what the negotiation layer can and cannot tell a defender in 2026.

The extension: ALPN on the wire

ALPN is a TLS extension. It rides inside the ClientHello, the first message the client sends, and it carries a list of application protocols the client is willing to speak once TLS finishes. The server reads the list, picks one it also supports, and echoes the single chosen protocol back in its ServerHello (or, in TLS 1.3, its EncryptedExtensions). RFC 7301 standardized this in July 2014, authored by Stephan Friedl, Andrei Popov, Adam Langley, and Emile Stephan. The extension code point is 16. That number is worth remembering, because it shows up directly in the extension list that fingerprints hash.

The encoding is small. The extension data is a ProtocolNameList: a two-byte length, then a sequence of protocol names, each of which is a single length byte followed by that many bytes of opaque protocol identifier. The RFC’s notation is terse:

opaque ProtocolName<1..2^8-1>;
struct {
ProtocolName protocol_name_list<2..2^16-1>
} ProtocolNameList;

So h2 is one byte of length (0x02) followed by the two ASCII bytes h, 2. http/1.1 is 0x08 followed by eight bytes. The names are registered with IANA, they are opaque byte strings rather than human text by spec, and the common ones are h2 for HTTP/2 over TLS, http/1.1 for HTTP/1.1, and h3 for HTTP/3 over QUIC. The string h2 is the identifier RFC 9113 assigns to HTTP/2 when it runs over TLS, and that RFC states that HTTP/2 over TLS MUST use ALPN to negotiate.

One detail in RFC 7301 does most of the fingerprinting work. The client sends its protocols “in descending order of preference.” A browser that prefers HTTP/2 lists h2 first and http/1.1 second. The order is not cosmetic. It tells the server which protocol to pick when more than one matches, and the spec gives the server license to honor that preference. The server SHOULD select the most preferred protocol it supports that the client also advertised. On no overlap it must abort with a fatal no_application_protocol alert, code 120. That ordered list, sitting in cleartext at a fixed position in the ClientHello, is the raw material everything downstream reads.

ALPN extension inside the ClientHello (extension type 16) 00 10 00 0e 00 0c 02 68 32 08 68 74 74 70 2f 31 2e 31 ext type ext len list len len 2 "h2" len 8 "http/1.1" The first name (h2, in orange) is the client's most preferred protocol. Order is by descending preference. The whole list is cleartext. h = 0x68, 2 = 0x32. Lengths shown are illustrative, not byte-exact for a real hello. *The ALPN extension is a length-prefixed list of length-prefixed protocol names. The first name is the client's first choice, and the server is allowed to honor that order.*

For a deeper byte-by-byte walk through where this extension sits relative to the cipher list, SNI, and the rest of the hello, see the TLS ClientHello field reference. What matters here is that ALPN is one field among many, and like the others, the choice of values and their order is set by the client’s TLS stack, not by the user.

NPN: the predecessor that worked backwards

Before ALPN there was NPN, Next Protocol Negotiation. Google introduced it in January 2010 as an IETF draft, and it existed to do one job: let Chrome and Google’s servers agree to speak SPDY over port 443 without an extra round trip. SPDY was Google’s experimental binary replacement for HTTP/1.1, and HTTP/2 grew directly out of it. NPN was the negotiation glue.

NPN and ALPN solve the same problem and disagree on almost every design choice. The direction is reversed. Under NPN the server advertised the list of protocols it supported, and the client picked one and confirmed it. Under ALPN the client advertises and the server picks. The IETF flipped the direction deliberately when it standardized ALPN, to bring it into line with how other TLS negotiations work, where the client offers and the server chooses.

The more consequential difference is where the choice lives. NPN was built so the client’s selection was encrypted. The mechanism, from Adam Langley’s draft-agl-tls-nextprotoneg, put a next_protocol handshake message after the ChangeCipherSpec and before Finished, which means it traveled under the negotiated TLS keys rather than in the cleartext hello. The draft even specified padding on that message: the padded length followed the rule 32 - ((len(selected_protocol) + 2) % 32), so the ciphertext length would not leak the name of the chosen protocol. The stated goal was to keep middleboxes from discriminating on which protocol a connection had picked. The draft names Tor explicitly as the worst case, a server whose mere advertisement of a capability could get its connections killed by a hostile network.

That privacy property did not survive the move to ALPN. ALPN puts the whole exchange in the clear. The client’s preference list rides in the ClientHello before any key exchange, and the server’s single choice rides back in plaintext too (in TLS 1.2; in TLS 1.3 the server’s half moves into the encrypted EncryptedExtensions, but the client’s offered list stays exposed in the hello regardless). The committee traded NPN’s confidentiality for ALPN’s simplicity and its alignment with normal TLS negotiation flow. For a fingerprinter that trade is a gift. The protocol list a client offers is now a stable, readable, cleartext attribute of every TLS connection it makes.

NPN (2010, SPDY) ALPN (RFC 7301, 2014) client server server lists protocols (cleartext) client picks one ENCRYPTED, padded Choice hidden from middleboxes. client server client lists protocols CLEARTEXT in hello server picks one (echoed back) Offered list visible to anyone on path. *NPN encrypted the client's choice and padded it so the length would not leak the name. ALPN moved the offered list into the cleartext ClientHello, which is exactly what makes it fingerprintable.*

NPN is gone now. SPDY’s deprecation was announced in February 2015 once HTTP/2 was ratified, with full removal targeted for 2016, and NPN went with it. OpenSSL added ALPN support in version 1.0.2, released January 2015, and NPN faded from the major TLS stacks over the following years. You will not see next_protocol_negotiation in a current browser’s hello. The point of the history is the design lesson it carries: the one negotiation scheme that tried to keep the protocol choice private lost to the one that put it in the clear, and fingerprinting inherited the result.

How JA3 and JA4 read the protocol list

A TLS fingerprint is a hash over the stable, client-chosen fields of the ClientHello. JA3, the older scheme from Salesforce, concatenated the TLS version, the cipher list, the extension list, the elliptic curves, and the curve formats, then MD5’d the string. JA3 did not read ALPN at all. The extension code point 16 showed up in JA3’s extension list (its presence counted), but JA3 never looked inside the extension at the protocol names. The offered protocols were invisible to it.

JA4 changed that. The FoxIO suite, released in 2023 by John Althouse and collaborators, rebuilt the fingerprint with ALPN as a first-class field. JA4’s TLS fingerprint has three parts. The first part, JA4_a, is human-readable and packs in the transport (t for TLS over TCP, q for QUIC), the TLS version, whether SNI is present, a two-digit cipher count, a two-digit extension count, and then the ALPN value. The second and third parts are truncated SHA-256 hashes of the sorted cipher list and the sorted extension list. The full thing looks like t13d1516h2_8daaf6152771_e5627efa2ab1, and the h2 sitting at the end of that first block is the ALPN.

The ALPN encoding in JA4 is deliberately compact. JA4 takes the first and last ASCII-alphanumeric characters of the first ALPN value. For h2 that is h2. For http/1.1 it is h1, first character h and last character 1. If there is no ALPN extension, or it has no values, or the first value is empty, JA4 prints 00. There is a fallback for exotic non-alphanumeric bytes, where JA4 substitutes characters from the hex representation instead, but for the protocols anyone actually offers the rule is just first-char and last-char of the first listed protocol.

JA4_a: the readable first block of a JA4 TLS fingerprint t13d1516 h2 _8daaf6152771_e5627efa2ab1 TLS/TCP, v1.3, SNI 15 ciphers, 16 exts ALPN First ALPN value "h2" -> first char h, last char 2 -> "h2" First value "http/1.1" -> "h1". No ALPN extension at all -> "00" *JA4 squeezes the first offered protocol into two characters and parks it in the human-readable block. Absence of the extension is itself a value: "00".*

This is a real change in what the fingerprint sees. Two clients with identical ciphers and extensions but different ALPN offers now produce different JA4 strings. A headless tool that omits ALPN entirely, or offers only http/1.1 when the browser it claims to be would offer h2,http/1.1, lands on a distinct fingerprint that no real browser produces. JA4 deliberately does not sort the protocol list before reading it, unlike its treatment of ciphers and extensions, which it sorts to defeat the extension-randomization that Chrome shipped in 2023 to break JA3. The ALPN order is preserved because the order carries the client’s preference, and preference is identity. For the full anatomy of the JA4 suite and what each member captures, see JA4+ in depth; for the broader move from JA3’s byte order to JA4’s sorted hashes, TLS fingerprinting from ClientHello bytes to JA4 covers the transition, and the Chrome extension randomization that broke JA3 explains why the sorting matters.

There is a subtlety worth being honest about. ALPN is a low-entropy field on its own. Almost every modern browser offers exactly h2,http/1.1, so in isolation the ALPN value barely narrows anything down. Its value to a fingerprinter is not as a unique identifier but as a consistency check. ALPN earns its keep when it disagrees with something else: with the User-Agent, with the cipher list, or, most usefully, with the protocol the connection actually speaks after the handshake.

The promise: ALPN must agree with the HTTP that follows

Here is where the negotiation layer stops being a passive label and becomes an active commitment. ALPN does not just describe the client. It selects the protocol the connection will run. When the handshake negotiates h2, RFC 9113 is explicit that the connection is now an HTTP/2 connection, and both ends MUST speak HTTP/2 over it. The client is supposed to follow the TLS handshake immediately with the HTTP/2 connection preface, the magic string PRI * HTTP/2.0 followed by a SETTINGS frame. A client that negotiated h2 and then sends an HTTP/1.1 request line GET / HTTP/1.1 has broken the contract, and a compliant server treats the malformed preface as a PROTOCOL_ERROR.

So there is a chain of forced agreement. The ALPN list in the ClientHello says what the client can speak. The server’s ALPN selection says what got chosen. The bytes after the handshake have to be the chosen protocol. Three points, and they all have to line up. A detection system sitting at the edge sees all three on the same connection, in sequence, with no way for the client to show one face to the TLS layer and a different face to the HTTP layer without the edge noticing.

This is exactly the cross-layer check that commercial bot detection leans on. The strongest signal in modern TLS-and-HTTP fingerprinting is not any single field but a contradiction between layers. A client that presents a flawless Chrome JA4, including h2 in the ALPN, and then sends an HTTP/2 SETTINGS frame whose parameters are in a non-Chrome order has told two different stories about who it is. Akamai’s edge has been reading HTTP/2 client characteristics since at least 2017, when Ran Shuster’s Black Hat EU whitepaper on passive HTTP/2 fingerprinting laid out how the SETTINGS frame, the WINDOW_UPDATE, the stream priority tree, and the pseudo-header ordering all vary by client stack and survive across requests. ALPN is the entry point to that whole second layer. It tells the edge “expect HTTP/2 next,” and then the edge grades the HTTP/2 against the browser the ALPN and TLS fingerprint claimed.

One connection, three points that must agree ClientHello ALPN h2, http/1.1 Server selection h2 Post-handshake bytes PRI * HTTP/2.0 ... Consistent: all three say HTTP/2. Nothing to flag. ALPN says h2 negotiated but client sends GET / HTTP/1.1 Contradiction across layers: PROTOCOL_ERROR, and a strong automation signal. *The ALPN offer, the server's selection, and the bytes after the handshake all have to name the same protocol. A client that negotiates h2 and then speaks HTTP/1.1 has contradicted itself on a single connection.*

Where the mismatch actually comes from

A self-contradicting connection sounds like a thing no real client would ever produce, and on a direct connection it mostly is. The mismatch tends to come from one of two places: a naive HTTP client whose layers were stitched together carelessly, or an intermediary that terminates and re-originates TLS.

The careless-client case is common in scraping stacks. A toolkit’s TLS layer might be configured to offer h2,http/1.1 because someone copied a browser’s ALPN list to look legitimate, while the HTTP layer underneath only knows how to speak HTTP/1.1. The handshake negotiates h2, the server expects an HTTP/2 preface, and the client sends an HTTP/1.1 request because that is all its HTTP code can do. The TLS fingerprint looks like a browser. The first frame after the handshake says otherwise. This shows up in the wild as protocol errors and fallback storms; a Node.js client that offers both protocols and then cannot speak the one it negotiated is a recurring class of bug report. The fingerprint surface of impersonation tools, including the second-order tells that ALPN-versus-HTTP disagreement produces, is covered in more depth in detecting curl-impersonate and uTLS.

The intermediary case is more interesting and is why “ALPN mismatch” has become shorthand for proxy detection. When a MITM proxy or a TLS-terminating forward proxy sits between the client and the server, the client negotiates ALPN with the proxy, and the proxy negotiates a separate ALPN with the origin. If the two halves disagree, or if the proxy advertises h2 to the client but can only relay HTTP/1.1 to the origin, the protocol the user’s machine thinks it negotiated and the protocol the origin actually receives diverge. Some proxy stacks get this subtly wrong: a secure web proxy that negotiates h2 with the client breaks an HTTP/1.1 CONNECT tunnel, because the client was waiting for HTTP/1.1 200 Connection established and instead received HTTP/2 SETTINGS frames. The edge that sits at the origin sees the result: a client whose claimed identity (its UA, its TLS fingerprint) does not match the protocol behavior arriving on the wire, which is a textbook proxy or interception signature.

Tools that take fingerprinting seriously close this gap by controlling both layers together. uTLS, the Go library that gives low-level control over the ClientHello for mimicry, lets a caller set the ALPN extension to match a target browser exactly. But the library’s own documentation is blunt that if you include ALPN you have to actually handle the protocol you negotiate, which means the HTTP/2 layer above it must speak h2 with browser-accurate SETTINGS, window sizes, and header ordering. Offering h2 is the easy part. Honoring it convincingly all the way up the stack is the hard part, and it is the part that separates a fingerprint that matches a browser from a connection that behaves like one. How uTLS mimics browser ClientHellos digs into the ClientHello side; the HTTP/2 side is where ALPN’s promise comes due. For how a specific vendor folds the TLS and HTTP/2 layers into one verdict, DataDome’s HTTP/2 and network fingerprinting and Cloudflare’s TLS and HTTP/2 fingerprinting both walk the cross-layer logic.

There is a quieter tell in the same family: ALPN absence. A client that offers no ALPN extension at all but otherwise looks like Chrome is suspicious, because Chrome always offers h2,http/1.1. JA4 captures this directly with its 00 value. An old curl that was built without HTTP/2 support, or a minimal TLS library that never learned ALPN, leaves the field empty, and that empty field is a fingerprint as surely as any populated one. Absence is data. The detection system does not need the client to lie; it just needs the client to differ from the population it claims to belong to.

What the negotiation layer can and cannot tell you

ALPN by itself is a weak fingerprint. The protocol list is short, almost everyone offers the same h2,http/1.1, and the entropy of the field rounds to nearly nothing in isolation. If you ranked ClientHello fields by how much they narrow down a client, ALPN would sit near the bottom, well below the cipher list or the extension ordering. A defender who only read the ALPN value and did nothing else with it would catch almost no one.

The field earns its place by what it forces, not by what it contains. ALPN is the hinge between the TLS handshake and the HTTP that follows. It is read in cleartext, it commits the connection to a specific protocol, and it gives the edge a prediction it can check against the next thing the client does. That makes the offered list less a label and more a tripwire. A fingerprint spoofer can copy a browser’s ALPN list in an afternoon. Keeping the promise that list makes, all the way up through HTTP/2’s SETTINGS frame and header ordering and flow-control behavior, on every request, through whatever proxy chain the traffic is riding, is the part that does not yield to a copied byte string. The negotiation layer is cheap to fake and expensive to honor, and the gap between those two costs is precisely the signal.

The history reinforces the point. NPN tried to keep the protocol choice private, padded and encrypted, specifically so a hostile network could not discriminate on it. ALPN threw that away for simplicity, and a decade later the thing NPN was trying to hide is one of the cleanest cross-layer consistency checks a bot detector has. The protocol you say you will speak, and the protocol you actually speak, have to be the same protocol. It is a small constraint. It is also one of the few in the whole fingerprinting stack that a client cannot satisfy by editing a single field, because satisfying it means being the thing you claim to be for the entire length of the connection.


Sources & further reading

Further reading