Skip to content

Server-side vs client-side bot detection: where the decision actually happens

· 22 min read
Copyright: MIT
Two stacked layers labelled network and client, with a verdict arrow in orange between them

A request arrives. Before it carries a single byte of HTML payload, before any cookie exists, before any script has run, the server already knows several things about the client that the client never chose to reveal. The order of cipher suites in the TLS ClientHello. The exact set of HTTP/2 SETTINGS values sent at connection open. The ASN the packets came from. None of this was picked by the person who wrote the scraper. It was picked by the network stack their code links against. The server can read all of it for free, on the first packet, with no cooperation from the client at all.

And yet that is never enough. The same server, to be confident the client is a real browser driven by a real person, has to push code down to the client and watch it run: canvas rendering that exercises the GPU, a check of whether navigator.webdriver is set, a measurement of how the mouse moved before the click. That code runs on hardware the attacker controls, which means every answer it sends back is suspect. So the question that organizes every commercial anti-bot stack is not “server or client.” It is where each decision can actually be made, what each placement costs, and how to reconcile two views of the same client that can disagree.

This post traces that split. First, what the server can decide on its own from the network, and why those signals are cheap, early, and hard to forge but coarse. Then JA3 and JA4, the TLS fingerprints that made network detection a product, and the moment in 2023 when Chrome broke the older one on purpose. Then the HTTP/2 frame fingerprint and IP reputation, the other two server-side pillars. Then the crossover to the client: what only JavaScript can see, why the attacker’s control of that environment is the whole problem, and how detection of automation frameworks works. Then the reconciliation layer, where the two views meet and a single mismatch sinks the session. The through-line is that neither layer is sufficient and the interesting engineering is in how they are combined.

What the server can decide alone

Start with the boundary. A server-side signal is anything the detector can compute from the bytes the client sends without asking the client to do extra work. The TCP handshake, the TLS handshake, the HTTP request line and headers, the HTTP/2 frames, and the source IP. All of it arrives as a side effect of the client wanting to talk at all. The client cannot opt out of sending it and still complete the connection.

That property is what makes the server-side layer valuable. It is present on the very first request, before any challenge, before any script. It works against clients that run no JavaScript at all, which describes most of the high-volume automated traffic on the internet: curl, Python requests, Go’s net/http, a thousand scraping scripts that never load a DOM. For that traffic, the network fingerprint is often the only signal a detector gets, and it gets it instantly. There is no round trip to wait on and no code to inject.

The cost is resolution. A network fingerprint identifies the stack, not the client. JA3 tells you “this is something that negotiates TLS the way Chrome’s BoringSSL does.” It does not tell you whether a person is driving that Chrome or a Playwright script is. Millions of legitimate users share one fingerprint, so the signal is a population label, not an identity. It narrows the field. It rarely closes a case on its own. The detector’s job at this layer is to spot clients whose network behavior contradicts what they claim to be, then hand the survivors to a layer that can look closer.

What each layer can see visible to server on first packet TCP options / IP + ASN TLS ClientHello (JA3 / JA4) HTTP/2 frames + headers needs JS to run on the client canvas / WebGL / audio navigator.webdriver, CDP mouse / timing / behavior Left: free, early, coarse. Right: rich, late, and runs on hardware the attacker owns. *The server reads the left column with no cooperation from the client. The right column only exists if the client agrees to run the detector's code, which a bot may decline or fake.*

JA3 and the TLS handshake

The TLS ClientHello is the richest thing a server gets for free. When a client opens a TLS connection it announces, in the clear before encryption begins, which protocol versions it supports, which cipher suites it will accept, which extensions it understands, which elliptic curves and point formats it offers. Browsers, HTTP libraries, and language runtimes each make these choices differently, and they make them in a stable order. That order is a fingerprint of the stack.

JA3 turned this into a product. Invented at Salesforce in 2017 by John Althouse, Jeff Atkinson, and Josh Atkins, it reads five fields from the ClientHello: the TLS version, the accepted ciphers, the list of extensions, the elliptic curves, and the elliptic curve point formats. It concatenates the decimal values in order, joining fields with commas and values with hyphens, then takes an MD5 of the result. The output is a 32-character hash. The authors noted at the time that the design kept fingerprints short enough to fit in a tweet, which tells you the intent: a shareable threat-intelligence token, not a precise device ID. JA3S applies the same idea to the Server Hello, fingerprinting how a server responds to a given client.

A worked example from the original Salesforce write-up shows the shape. The pre-hash string 769,47-53-5-10-49161-49162-49171-49172-50-56-19-4,0-10-11,23-24-25,0 hashes to ada70206e40642a3e4461f35503241d5. The leading 769 is the TLS version. The long hyphenated run is the cipher list in the exact order the client sent it. Then extensions, curves, and point formats. The ordering matters because it is precisely what a hand-rolled HTTP client gets wrong: it negotiates a TLS handshake that no real browser would produce, and the JA3 hash for, say, default Python requests looks nothing like the hash for Chrome.

JA3: five ClientHello fields, joined, then MD5 769 47-53-5-10-49161-... 0-10-11 23-24-25 0 version ciphers extensions curves formats ada70206e40642a3e4461f35503241d5 *The pre-hash string preserves the order the client sent each field. Order is the signal, which is exactly what made JA3 fragile once browsers started shuffling it.*

The fragility was structural. JA3 depended on extension order, and in January 2023 Chrome started randomizing that order on purpose. The change shipped around Chrome 110, with the behavior appearing as early as versions 108 and 109, and the motivation had nothing to do with anti-bot. The Chromium team wanted to stop servers from hardcoding an expectation of a fixed extension order, the same anti-ossification logic behind TLS GREASE. The side effect was severe. With roughly fifteen permutable extensions, the number of orderings runs to about 15 factorial, on the order of 10^12. Fastly watched the dominant Chrome JA3 hash, cd08e31494f9531f560d64c695473da9, fall off a cliff in their traffic right after the rollout. A fingerprint that changes on every connection is not a fingerprint. For identifying Chrome, JA3 was effectively finished.

JA4 and the fix

JA4 is the answer to that problem, built by the same person who built JA3. Althouse released the JA4+ suite through FoxIO, with the core JA4 method under a BSD 3-clause license and the broader suite under FoxIO’s own terms. The central design change is one line: sort the cipher suites and the extensions into hex order before hashing, so the order the client happens to send them in does not affect the output. Chrome can shuffle its extensions all it likes; after the sort, the fingerprint is stable again.

The format also carries more, and it is human-readable in a way JA3’s opaque MD5 was not. A JA4 fingerprint has an a_b_c shape. Take the canonical example t13d1516h2_8daaf6152771_e5627efa2ab1. The a section is a readable prefix: t for TLS over TCP (versus q for QUIC), 13 for TLS 1.3, d for an SNI domain being present, 15 for fifteen cipher suites after GREASE values are stripped, 16 for sixteen extensions, and h2 taken from the first and last characters of the first ALPN value. The b section is a twelve-character truncated SHA-256 of the cipher list, sorted in hex order. The c section is a truncated hash of the sorted extensions, with SNI and ALPN removed because they already appear in a, followed by the signature algorithms in their original order. Stripping GREASE and sorting the lists is the entire point: it survives the randomization that killed JA3.

JA4: readable prefix, then two sorted hashes t13d1516h2 8daaf6152771 e5627efa2ab1 a: proto, version, SNI, cipher count, ext count, ALPN b: SHA-256 of ciphers, sorted in hex c: SHA-256 of sorted exts + sig algorithms The sort in b and c is the whole fix: extension shuffling no longer changes the hash. *JA4 keeps a readable prefix and sorts the cipher and extension lists before hashing, which is what makes it survive Chrome's 2023 randomization.*

JA4 is one of a family. JA4S fingerprints the server’s response, JA4H the HTTP client headers, JA4X an X.509 certificate, JA4T the TCP layer, and JA4L measures client-to-server latency. The full set lets a detector fingerprint several layers of the same connection and check them against each other. On the commercial side, Cloudflare exposes JA3 and JA4 to Enterprise Bot Management customers, and in 2024 added a derived layer it calls JA4 Signals: aggregate ratios like the share of a fingerprint’s traffic that runs HTTP/2 versus HTTP/3, or how often a fingerprint trips heuristics, computed across Cloudflare’s whole network rather than from one request. For the deeper mechanics of how those fingerprints feed a score, the Cloudflare TLS and HTTP/2 fingerprinting and JA3-to-JA4 posts go field by field.

The HTTP/2 frame fingerprint

TLS is not the only thing a stack reveals before the application speaks. HTTP/2 has its own setup handshake, and it leaks just as much. When an HTTP/2 connection opens, the client sends a SETTINGS frame declaring its parameters, a WINDOW_UPDATE for flow control, and, in the original protocol, PRIORITY frames describing how it wants streams weighted. It also sends its pseudo-headers, :method, :path, :authority, :scheme, in a particular order. The application code never picks any of this. The HTTP/2 library does, and different libraries do it differently.

Akamai’s threat-research team turned this into a fingerprint and presented it at Black Hat Europe 2017. The white paper, by Ory Segal, Aharon Fridman, and Elad Shuster, was built from more than ten million HTTP/2 connections, from which they extracted fingerprints for over forty thousand distinct user-agent strings across hundreds of implementations. The format the industry standardized on concatenates four parts: the SETTINGS frame parameters with their values, the WINDOW_UPDATE increment, the PRIORITY frame details, and the pseudo-header order. An example from the paper looks like 1:65536;3:1000;4:6291456|15663105, the SETTINGS pairs on the left of the pipe and the window update on the right. A fuller form adds the priority and header-order sections, often written as S[...]|WU|P[...]|order.

The signal is strong because a browser’s HTTP/2 fingerprint is consistent and the values are non-obvious. A scraper built on a generic HTTP/2 library sends a different SETTINGS table than Chrome does, in a different order, with a different initial window. There is a catch the implementations document plainly: because HTTP/2 connections are reused across many requests, the PRIORITY portion can shift between requests on the same connection, so detectors lean more on the stable SETTINGS and window values. The deeper point is the same one that motivates the whole server-side layer. A client that perfectly spoofs its JA4 but sends a mismatched HTTP/2 fingerprint has contradicted itself, and the contradiction is the detection. A real Chrome’s TLS fingerprint and HTTP/2 fingerprint always agree, because the same browser produced both. Akamai’s own HTTP/2 fingerprinting work and DataDome’s both treat that cross-layer agreement as a first-class signal.

IP, ASN, and the reputation layer

The last purely server-side signal is the source address, and it is the one that needs no parsing at all. Every packet carries an IP, every IP belongs to an autonomous system, and the ASN says a great deal about what kind of network the client sits on. A request from a hosting provider’s ASN, AWS, GCP, a bulk VPS shop, is automated until proven otherwise, because almost no real users browse from a datacenter. A request from a residential ISP’s ASN, or better a mobile carrier’s, starts from a position of trust because that is where humans actually are.

Detectors turn this into a score. In practice, consumer ISP ranges sit near the bottom of the suspicion scale while datacenter and hosting ranges start high regardless of how clean the specific IP is, with intermediate scores triggering step-up challenges and the top of the range triggering an outright block. The exact thresholds are vendor-specific and not publicly standardized, but the shape is consistent across the industry: ASN class is often the first thing scored, before any fingerprint is even computed, because it is the cheapest signal of all.

This is also where the server-side layer hits its ceiling, and where the modern abuse economy lives. Residential proxy networks exist precisely to launder datacenter traffic through real consumer IPs, so the ASN reads as a home connection. Against that, the IP signal alone is nearly useless, and detectors fall back to behavior: a residential proxy pool rotates across unrelated ISPs faster than any human moves, produces impossible geographic jumps between requests, and pairs a trustworthy IP with a browser fingerprint that does not fit. None of those tells comes from the IP itself. They come from correlating the IP against the other layers over time, which is the seam where pure server-side detection stops and the rest of the stack takes over. The residential proxy and ASN detection post covers that fallback in depth.

The crossover: what only the client can see

Everything to this point is decidable from bytes the client had to send. The trouble is that all of it describes the connection, and a determined adversary can rebuild the connection to order. Tools like curl-impersonate and uTLS exist to forge a Chrome-shaped ClientHello from a non-browser client. A scraper can match Chrome’s JA4, match its HTTP/2 SETTINGS, and route through a residential IP. At that point the entire server-side layer reports “looks like Chrome,” and it is wrong. The network said everything it could say, and the client lied about all of it.

To go further the detector has to find out what the client is, not just what it claims, and that means running code inside it. This is the client-side layer. It exists because some facts about a browser are only observable from inside the browser. Whether the GPU actually renders a canvas the way a real GPU does. Whether navigator.webdriver is set. Whether the JavaScript engine’s timing, error objects, and global namespace match a genuine Chrome or a patched automation build. A server cannot see any of this. It can only see it if the browser runs the detector’s script and reports back.

The split is clean if you look at the headless-Chrome case. Server-side, a default headless Chrome gives itself away in HTTP: its User-Agent contains the substring HeadlessChrome, its sec-ch-ua client-hint header carries a matching HeadlessChrome brand, and it omits Accept-Language by default where a normal browser sends one. Those are network signals, and they are the first thing a competent operator fixes, because they are trivial to override. So the detector pushes into JavaScript, where the same browser leaks things it cannot rewrite from outside: navigator.webdriver reads true under automation, Playwright injects globals such as window.__playwright__binding__ and window.__pwInitScripts into the page, and the runtime carries side effects from the Chrome DevTools Protocol that every major framework uses to drive the browser.

Detecting one headless browser, two ways server-side (HTTP, easy to spoof) UA contains "HeadlessChrome" sec-ch-ua brand mismatch Accept-Language missing An operator clears all three in minutes by setting headers. client-side (JS, hard to spoof) navigator.webdriver === true window.__pwInitScripts CDP Runtime.enable side effect Each lives inside the runtime and survives header spoofing. Same browser. The network tells are cheap to erase; the in-runtime tells are not. *The left column is what the bytes reveal. The right column only exists if the browser runs the detector's JavaScript, which is why anti-bot scripts are large and obfuscated.*

The CDP side effect is the cleanest recent example of a client-only signal, and it is worth being precise about it because the precise mechanism is documented. Puppeteer, Playwright, and Selenium all drive Chrome through the DevTools Protocol, and they all issue the Runtime.enable command to receive events from the runtime domain. Issuing it has an observable consequence: it changes how the runtime serializes certain objects, which a few lines of script on the page can detect. DataDome’s Antoine Vastel made the technique public on June 13, 2024. The detail that matters for this post is where it can be observed. The server never sees the CDP traffic, which flows over a local socket between the automation library and the browser. Only code running inside that browser can notice the side effect. It is a pure client-side signal by construction, and there is no network-layer equivalent. The deeper treatment is in the JavaScript runtime fingerprinting post.

Why the client layer is the hard one

Client-side detection has the opposite tradeoff profile from server-side. The signals are far richer, an anti-bot script can collect a hundred or more device, browser, and behavioral attributes, but every one of them is reported by code running on a machine the attacker controls. That is the central asymmetry of the entire field. The server-side layer reads bytes the client cannot lie about without breaking the connection. The client-side layer reads answers the client can lie about freely, because the client owns the runtime, can patch it, and can hand the script whatever values it likes.

This is why anti-bot JavaScript is large and aggressively obfuscated rather than a clean readable check. Akamai’s sensor script is on the order of half a megabyte. The size is not the detection. The detection is the inventory it builds, canvas and WebGL renders that exercise the real GPU, the audio-context fingerprint, installed fonts, screen metrics, the precise shape of the window.chrome object, the navigator.webdriver flag, plus behavioral telemetry like mouse paths and keystroke timing. The obfuscation exists to make the script expensive to reverse and to make spoofing every value consistently harder than spoofing any one of them. An attacker who patches navigator.webdriver to undefined but forgets that their canvas renders like a server with no GPU has produced a contradiction, and the contradiction is the catch. Consistency is the thing the client layer is really testing, because individual values are all forgeable and the cost is in forging them all at once without slipping.

The collected payload does not get trusted on the client. It gets shipped back. Akamai’s script encrypts its telemetry and POSTs it as the sensor_data payload, and the edge validates it server-side before issuing or refreshing the _abck cookie that later requests must carry. The cookie has a fixed internal layout and a stale or malformed one is the single most common reason an automated client eats a 403. The architecture is deliberate: collect rich signals on the client because that is the only place they exist, but make the verdict server-side where the attacker cannot reach it. The same pattern drives the Akamai sensor_data payload and the DataDome JS tag. The client gathers; the server decides.

Where the two views meet

The reconciliation is the part that actually produces a verdict, and it is the reason neither layer ships alone. A modern stack computes a server-side network fingerprint on the first packet, then, for anything that survives, pushes a client-side script and collects a device-and-behavior fingerprint, then checks the two against each other for agreement. A real Chrome is internally consistent across every layer because one browser produced all of it: the JA4 says Chrome, the HTTP/2 SETTINGS say Chrome, the User-Agent says Chrome, the canvas renders like Chrome on real hardware, and navigator.webdriver is unset. A forged client fails to hold the story together across all of them at once.

The failure modes are specific. A scraper that perfectly impersonates Chrome’s TLS via uTLS but speaks HTTP/1.1 instead of HTTP/2 has a TLS fingerprint that claims a version of Chrome which always prefers HTTP/2, so the layers disagree. A headless browser that fixes its HTTP headers but leaves navigator.webdriver set has a clean network layer and a dirty runtime. A residential-proxied request with a real consumer ASN but a Linux server’s font list and a GPU-less canvas has a trustworthy IP wrapped around an untrustworthy device. In every case it is the cross-layer mismatch that lands the detection, not any single signal. The detector is not asking “is this fingerprint on a blocklist.” It is asking “do all the things this client told me about itself fit together,” and bots are caught in the gaps between the answers.

The two views are reconciled, not added JA4 / HTTP2 / ASN canvas / webdriver / CDP consistency check score Examples of a mismatch that lands the verdict: Chrome JA4 + HTTP/1.1 -> versions disagree clean headers + webdriver=true -> network vs runtime residential IP + GPU-less canvas -> IP vs device *The verdict comes from agreement across layers. A bot that perfects any one layer still has to make all of them tell the same story, which is much harder.*

There is one more reason the two layers coexist that has nothing to do with detection power: not every client can run the client-side layer, and the server-side layer is the only thing that covers them. A plain HTTP scraper that never executes JavaScript will never run the device-fingerprinting script, so for that traffic the network fingerprint and the IP are the entire defense. The client-side layer, conversely, is the only thing that catches a full headless browser that has cleaned up its network signals. The two cover each other’s blind spots. Server-side handles the no-JS firehose cheaply and early; client-side handles the sophisticated browser-driven minority that gets past it. A stack that ran only one would be wide open on the side it skipped.

What the split looks like in 2026

The state today is that the boundary between the layers has stopped moving much, while the contest inside each layer keeps accelerating. The server-side layer settled on JA4 after Chrome’s randomization retired JA3 for browser identification, paired with the HTTP/2 frame fingerprint and ASN reputation, and that trio is stable across the major vendors. The client-side layer settled on a large obfuscated collector that gathers device and behavioral signals, ships them server-side for a verdict, and issues a validated token. What changes month to month is not the architecture. It is the specific signals, the CDP Runtime.enable tell surfacing in 2024 being a good example of a new client-side signal slotting into a fixed structure, and the corresponding scramble to neutralize it.

The reason the split is stable is that it follows from physics, not fashion. Some facts about a client are on the wire and cost nothing to read but reveal only the stack. Other facts are inside the runtime, reveal the actual client, and can only be obtained by running code the client is free to corrupt. No amount of engineering collapses that into one layer, because the constraint is informational. You cannot read a canvas fingerprint off a TCP packet, and you cannot trust a canvas fingerprint reported by code the adversary controls without an independent network signal to check it against. The whole design, cheap server-side triage in front, expensive client-side interrogation behind, cross-checked in the middle, is what you get when you take that constraint seriously.

Which leaves the asymmetry that defines the whole field. The defender has to make every layer agree. The attacker only has to find one layer that does not, or one place where forging consistency across all of them is too expensive to bother. That gap is exactly where the engineering lives, and it has stayed open for as long as there have been bots, because closing it on one side just moves it to the other.


Sources & further reading

Further reading