Skip to content

Why your Python requests is fingerprintable before it sends a byte of HTTP

· 20 min read
Copyright: MIT
A ClientHello byte layout with the cipher list and extensions highlighted, labelled requests/urllib3/OpenSSL

You write three lines of Python, hit a TLS site, and get a 403 before your User-Agent header ever mattered. You rotate the header. Still 403. You add a residential proxy. Still 403. The site has not read a single byte of your HTTP request, because the block happened one layer down, during the handshake, on the very first packet your client sent. That packet is the TLS ClientHello, and a default requests install announces itself in it as loudly as if it had stamped “Python” on the envelope.

This is the part of the stack most scraping tutorials skip. The header you control. The handshake you do not, at least not without replacing the machinery underneath. A browser’s ClientHello and an OpenSSL-backed Python client’s ClientHello differ in ways that are stable, easy to hash, and trivially separable, and they differ before any application data exists to inspect. This post is about exactly what those differences are, why they exist, and why an entire category of libraries, from curl-impersonate to curl-cffi to uTLS, exists for the sole purpose of papering over them.

We start with the ClientHello itself, field by field. Then the cipher list and why ordering is a fingerprint, the extension set and the GREASE values Python never sends, and how JA3 turns all of that into a 32-character hash. After that, why Chrome’s 2023 extension shuffle broke JA3 and why JA4 sorts the bytes to survive it. Then the impersonation libraries, what they actually swap out, and where the mimicry stops. The through-line: the TLS layer leaks a client identity that has nothing to do with anything you typed.

The ClientHello is a self-description

When a TLS client opens a connection, before any encryption is in place, it sends a ClientHello. This message is in the clear. Anyone on the path, and certainly the server, can read every field. The ClientHello exists to tell the server what the client can do, so the two can agree on parameters. It carries the highest protocol version the client supports, a list of cipher suites in preference order, a list of extensions, and inside those extensions a pile of further lists: supported elliptic curve groups, point formats, signature algorithms, ALPN protocol identifiers, and more.

Here is the load-bearing observation. The client does not choose these values at request time. They are baked into the TLS library it was compiled against and the configuration that library ships with. A given build of Chrome emits the same ClientHello on every connection. A given build of Python against a given OpenSSL emits its own, different, equally stable ClientHello. The contents and ordering are decided by the library, not by the user, which is the exact property that makes the ClientHello a fingerprint. You cannot accidentally type your way out of it, because you never typed your way into it.

What the server sees, in order 1. ClientHello ciphers, extensions, curves, ALPN 2. ServerHello / certs / keys 3. (encrypted) HTTP request you control: nothing here decided by the TLS library you control: User-Agent, headers, cookies a block can land here, before step 3 ever happens *The ClientHello is step one, in the clear, and a server can drop the connection on it. Header rotation only touches step three, which a blocked client never reaches.*

The Crawlex post on the TLS ClientHello, field by field walks the wire format in more detail. For our purposes, three of those fields do almost all the fingerprinting work: the cipher suite list, the extension list, and the named-group list inside the extensions.

The cipher list, and why the order is the tell

A cipher suite is a bundle of cryptographic choices: a key exchange, a bulk cipher, a MAC. On the wire each suite is a two-byte code. TLS 1.2 suites like TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 have well-known numeric values, and the ClientHello carries them as a length-prefixed array of those two-byte codes.

Two things about that array give a client away. The first is membership: which suites are present at all. Python against a modern OpenSSL offers a security-hardened set. Since Python 3.10 the default ssl context restricts to AES-GCM and ChaCha20-Poly1305 suites with forward secrecy at OpenSSL security level 2, which prohibits RSA and DH keys under 2048 bits and ECC keys under 224 bits. That is a sensible security posture. It is also a specific, recognizable set that does not match what any shipping browser offers, because browsers carry a longer tail of suites for compatibility and order them by their own preferences.

The second tell is ordering. The ClientHello lists suites in the client’s preference order, most-preferred first, and that order is part of the fingerprint. Two clients can support an identical set of suites and still produce completely different fingerprints if they list them in different sequences. OpenSSL’s default ordering comes from its cipher-string evaluation; a browser’s comes from its own hardcoded list tuned to its own priorities. The sequences do not line up. This is why JA3, which we get to shortly, hashes the cipher codes in the order they appear rather than sorting them first.

Defenders noticed early that ordering was load-bearing, and so did evaders. Randomizing cipher order to dodge a fingerprint became common enough to earn a name, cipher stunting, where a client shuffles its suite order on every connection so no single hash sticks. It works against a naive hasher and creates its own problem, which we will come back to when JA4 enters the picture. The deeper treatment lives in the Crawlex post on cipher suite ordering as a fingerprint.

The extensions, and the values Python never sends

After the cipher list, the ClientHello carries a list of extensions. Each extension is a two-byte type code followed by its own payload. This is where the gap between Python and a browser is widest, because the extension list is long, ordered, and full of values a default Python client simply does not produce.

A real Chrome ClientHello carries extensions that a default requests does not bother with, or carries them in a different arrangement. The set and the order both matter. A browser advertises ALPN to negotiate HTTP/2, carries signed_certificate_timestamp, session ticket support, and a specific arrangement of the rest. Python’s defaults differ on all of these axes at once. The membership differs, the order differs, and one whole category of values is absent: GREASE.

Browser extension list (sketch) Python / OpenSSL default (sketch) GREASE 0x?a?a server_name ALPN h2 supported_groups sct, ticket, ... GREASE 0x?a?a server_name supported_groups ec_point_formats sig_algs, ... no GREASE shorter list, different order *Schematic, not a byte-exact capture. The point is the shape: a browser brackets its list with GREASE and carries ALPN; an OpenSSL default does neither, and orders what it has differently.*

GREASE, the values that exist to be ignored

GREASE is worth its own paragraph because it is the single cleanest browser-versus-Python tell. The mechanism comes from RFC 8701, “Applying Generate Random Extensions And Sustain Extensibility (GREASE) to TLS Extensibility,” published in January 2020 and authored by David Benjamin at Google. The problem it solves is ossification: if servers and middleboxes only ever see a fixed set of TLS values, they start hard-coding assumptions, and the protocol can no longer add new values without breaking those boxes. GREASE fixes this by having clients advertise deliberately meaningless values that servers must ignore, keeping the path tolerant of unknown values.

The reserved values are a specific set of sixteen: 0x0A0A, 0x1A1A, 0x2A2A, 0x3A3A, 0x4A4A, 0x5A5A, 0x6A6A, 0x7A7A, 0x8A8A, 0x9A9A, 0xAAAA, 0xBABA, 0xCACA, 0xDADA, 0xEAEA, and 0xFAFA. A client may sprinkle these into its cipher suites, its extensions, its supported groups, its signature algorithms, its supported versions, and its ALPN list. The server’s job is the opposite: servers must not negotiate any GREASE value, and must not treat it differently from any other unknown value.

Chrome, and Chromium-based browsers generally, send GREASE. Every modern browser handshake has these placeholder values seeded through it. Python’s ssl module, built on OpenSSL, does not add GREASE on the client side by default. So the presence of a GREASE cipher in the very first slot, or a GREASE extension bracketing the list, is a binary signal: GREASE present leans browser, GREASE absent leans tooling. It is not conclusive on its own, but it is cheap to check and it is right most of the time. The Crawlex deep-dive on GREASE values in the ClientHello covers the placement rules and the edge cases.

Why Python can’t just fix this from the application

Here is the structural problem, and it is the reason the impersonation libraries exist at all. Python’s ssl module exposes a thin slice of what OpenSSL can do. You can set the minimum and maximum TLS version. You can hand it a cipher string to restrict or reorder the suite list. What you cannot do, through the standard ssl API, is reorder the extensions, inject GREASE, change the supported-groups list arbitrarily, or otherwise reshape the ClientHello into a browser’s exact byte sequence. The configurable surface is roughly cipher suite and TLS version, which means every pure-Python HTTP client built on ssl is exposed to extension-level fingerprinting no matter how carefully you tune the ciphers.

urllib3, which sits under requests, makes this concrete. It does not ship its own browser-mimicking cipher list; it defers to the system OpenSSL defaults on modern builds, sets a TLS 1.2 minimum, disables compression and session tickets for security reasons, and stops there. Nothing in that path adds GREASE or rearranges extensions to look like Chrome, because the ssl API it sits on does not offer the knobs to do so. The hardening is real and sensible. It also produces a ClientHello that is unmistakably not a browser.

JA3: turning the handshake into a hash

If a ClientHello is a self-description, JA3 is the method that compresses that description into something you can store in a blocklist. Salesforce engineers John Althouse, Jeff Atkinson, and Josh Atkins published it in 2017. The idea is simple enough to implement in an afternoon, which is exactly why it spread.

JA3 reads five fields out of the ClientHello and concatenates their decimal values in a fixed order: the TLS version, the accepted cipher suites, the list of extensions, the elliptic curves (supported groups), and the elliptic curve point formats. Fields are joined with commas; the values within a field are joined with hyphens. That string is then MD5-hashed to a 32-character fingerprint. A scraper running Python’s requests produces a JA3 string that begins 771,4866-4867-4865-... where 771 is the decimal for TLS 1.2 and the rest is the cipher list, followed by the extensions, curves, and point formats, all hashing down to a value like 8d9f7747675e24454cd9b7ed35c58707. Every default requests install on a matching OpenSSL produces that same string, and therefore that same hash, every single time.

JA3: five fields, comma-joined, then MD5 version 771 ciphers 4866-4867-... extensions 0-11-10-... curves 29-23-... pt formats 0-1-2 771,4866-4867-...,0-11-10-...,29-23-...,0-1-2 MD5 -> 8d9f7747...c58707 *The five JA3 fields in order, comma-joined into one string, then MD5'd. A default Python client produces the same string on every run, so the hash is a stable label.*

The reason JA3 took over network detection is the same reason it is dangerous for a scraper. The fingerprint is not about you, the request author. It is about the build of the tool you are running. The original Salesforce work used this to spot malware families and pen-testing tools on a network regardless of their destination IP or certificate, because Meterpreter on Linux or a given malware loader emits a consistent ClientHello. The exact same property lets an edge network keep a list of “known automation” JA3 hashes and drop them on sight. Server-side, capturing the ClientHello and computing the hash is cheap; the Crawlex survey of server-side TLS fingerprinting libraries covers how the edge does it at line rate.

How Chrome broke JA3, and why JA4 sorts the bytes

JA3 has a soft spot, and Chrome stepped on it deliberately. JA3 hashes the extension list in the order the extensions appear. If that order is stable, the hash is stable. If the order changes per connection, the hash changes per connection, and a single client smears across an unbounded number of JA3 values.

That is exactly what Chrome started doing. Beginning around January 20, 2023, and shipping broadly with Chrome 110 after trials in 108 and 109, Chromium began permuting the order of its ClientHello extensions on every connection. The stated motivation matches GREASE’s: prevent servers from ossifying on a fixed extension order so Chrome stays free to change its TLS later. The TLS 1.3 spec permits extensions in any order, with one constraint, pre_shared_key must be last if present, so the permutation is standards-clean. Fastly, watching its own traffic, saw the dominant Chrome JA3 fingerprint cd08e31494f9531f560d64c695473da9 fall off a cliff right after that date as the per-connection shuffle scattered it. JA3 for Chrome stopped being a single value and became a cloud of values.

This is the wrinkle that connects back to cipher stunting. Shuffling order to evade a hash, whether it is a censor doing it to ciphers or Chrome doing it to extensions, defeats any fingerprint that depends on order. The answer the industry converged on was to stop depending on order where order is not informative. JA4, published by FoxIO (John Althouse again, now outside Salesforce), does precisely that.

JA4 is the TLS member of the JA4+ suite and it is built to survive permutation. Instead of one MD5, JA4 has three parts joined by underscores. The first part, the a section, is human-readable: the transport (t for TLS over TCP, q for QUIC), the TLS version, whether SNI is present, a two-digit cipher count, a two-digit extension count, and the first ALPN value. The second part is a 12-character truncated SHA256 of the cipher list sorted into hex order. The third part is a 12-character truncated SHA256 of the extension list, also sorted, followed by the signature algorithms in their original unsorted order. Sorting the ciphers and extensions before hashing is the whole trick: it means cipher stunting and Chrome’s extension shuffle both collapse back to the same fingerprint, because reordering a set you are about to sort changes nothing.

JA4 also throws GREASE away wherever it appears. GREASE values do not count toward the cipher count, the extension count, or either hash. So a browser’s randomized GREASE seeding does not perturb the JA4 the way it would a naive order-sensitive hash. The full progression from JA3’s MD5 to the JA4+ family is the subject of the Crawlex post on TLS fingerprinting from ClientHello bytes to JA4, and the wider suite gets its own treatment in JA4+ in depth.

The practical consequence for a Python scraper is that switching to JA4-based detection does not help the scraper at all. Python’s handshake does not permute its extensions, so it was never gaining cover from the thing JA3 was weak to. Against JA4 it is, if anything, more legible, because JA4’s readable a section spells out a cipher count and extension count and ALPN value that already do not match any browser. The sorting that rescued Chrome’s fingerprint does nothing for a client whose underlying set is wrong to begin with.

The impersonation libraries, and what they actually replace

If the application layer cannot reshape the ClientHello, the only way to make a Python client’s handshake look like a browser’s is to replace the TLS engine underneath it. That is the entire design of the impersonation ecosystem, and it is worth being precise about what these tools swap and what they leave alone. This section is descriptive. The blog does not publish working evasion recipes; the point here is the mechanism.

curl-impersonate, created by lwthiker, is the origin of the modern lineage. It is a patched build of curl that emits a browser’s exact handshake. Default curl, like default Python, produces a ClientHello that differs drastically from a browser’s, because it too is built on a general-purpose TLS library configured for interoperability rather than mimicry. curl-impersonate’s move is to swap the TLS stack entirely: BoringSSL for Chrome and Safari targets, NSS for Firefox targets, the same libraries the real browsers use. With the browser’s own TLS library compiled in, it then sets the cipher list, the curve list, the extensions, the GREASE behavior, and the HTTP/2 settings to match a specific browser version. The released targets covered Chrome 99 through 116, Firefox ESR 91 through 117, Edge, and several Safari versions in the v0.6.1 era of early 2024.

curl-cffi is the piece that brings this into Python. It is a cffi binding around a curl-impersonate fork, now maintained by lexiforest as the original project’s activity slowed. Because the heavy lifting happens in native curl with a browser TLS library underneath, curl-cffi can present a Chrome or Firefox or Safari TLS and HTTP/2 fingerprint to a server while exposing a requests-like API to the Python programmer. The fingerprint is selected by name, pinned to a browser version. The mechanism is the swap, not a clever reconfiguration of ssl, because ssl cannot be reconfigured far enough.

uTLS comes at the same problem from the Go world and is the most rigorously documented of the three. It is a fork of Go’s standard crypto/tls that exposes low-level control of the ClientHello. The motivation is stated plainly in the project: Go’s standard-library ClientHello has a very distinctive fingerprint, one that “especially sticks out on mobile clients, where Golang is not too popular yet,” which makes Go-based censorship-circumvention tools trivial to block with little collateral damage. uTLS lets a tool emit a parroted browser ClientHello, or a randomized one drawn from fully supported ciphers and extensions in random order, which the docs recommend rotating rather than pinning to a single value.

requests + ssl curl-cffi / uTLS your code: headers, cookies ssl module: cipher str, version OpenSSL: fixed ClientHello shape no GREASE, OpenSSL extension order your code: same API pick a browser profile by name BoringSSL / NSS / patched tls browser ciphers, extensions, GREASE The configurable layer in pure Python stops above the handshake. Impersonation moves the swap down to the TLS library itself. *Pure Python lets you tune ciphers and version, then OpenSSL fixes the rest. Impersonation libraries replace the bottom box with a browser's own TLS engine.*

Where the mimicry stops

The honest part of this story is that none of these tools claims to be a browser all the way down, and the people who built them said so first. The uTLS documentation is blunt about it: “Parroting could be imperfect, and there is no parroting beyond ClientHello.” The library reproduces the first handshake message. It does not, by itself, reproduce everything that happens after.

That limit has a name in the censorship-circumvention literature, the “parrot is dead” attack: a parrot that copies one observable can be unmasked by a second observable it forgot to copy. The 2019 NDSS paper by Sergey Frolov and Eric Wustrow at Colorado, the same work that produced uTLS, made the case with scale behind it. They analyzed over 11.8 billion TLS connections across nine months and found that circumvention tools which had put real effort into parroting popular TLS stacks were still distinguishable from the traffic they imitated. A ClientHello that matches Chrome’s bytes can still be betrayed by how the connection behaves once data flows, by a response to a record the real browser would handle differently, or by a layer the parrot does not touch at all.

For a scraper in 2026, the layer that does the betraying is usually HTTP/2. A browser-perfect ClientHello connecting to an HTTP/2 origin still has to send an HTTP/2 connection preface, a SETTINGS frame, window updates, and a header block with pseudo-headers in a particular order. Those carry their own fingerprint, formalized in the Akamai HTTP/2 format, and a client whose TLS says “Chrome 124” while its HTTP/2 SETTINGS say “Go default” has just contradicted itself across two layers. The Crawlex posts on HTTP/2 fingerprinting and the Akamai format and HTTP/2 pseudo-header ordering cover that surface, and the broader point that detection is increasingly a cross-layer consistency check rather than any single hash runs through how Cloudflare uses TLS and HTTP/2 fingerprints in bot scoring. Even the ClientHello catalogues drift: browser handshakes change version to version, so a profile pinned to Chrome 116 slowly ages out of the live distribution, as the browser ClientHello catalog tracks.

What the handshake actually decides

Strip away the tooling and the names and one fact remains. The TLS handshake carries a description of the client’s TLS library, in the clear, on the first packet, and that description has nothing to do with anything a scraper author types. The cipher list, its order, the extension set, the GREASE values, the named groups: all of it is fixed by the build, all of it is hashable, and all of it is available to a server before the HTTP request is decrypted, often before it is even sent. A 403 on a clean residential IP with a perfect User-Agent is usually this and nothing more.

The libraries that exist to close the gap close it by admitting the gap cannot be closed from above. You do not reconfigure requests into a browser; you replace the TLS engine under it with a browser’s. And even then the match is exact only for the one message it was built to copy, which is why a JA3 or JA4 that reads as Chrome is a necessary condition for passing as Chrome and nowhere near a sufficient one. The fingerprint moved the contest from the application layer down to the transport, and then the next move was always going to push it back up, into HTTP/2 and beyond, where a client that lied about its TLS gets caught contradicting itself one layer later. The handshake is where the scraper stops choosing and the library starts speaking for it, and a server that listens carefully can hear which library that is.


Sources & further reading

Further reading