Detecting a proxy by OS mismatch: when the TCP stack disagrees with the user-agent

A request arrives. The User-Agent string says Mozilla/5.0 (Windows NT 10.0; Win64; x64) ... Chrome/121. Confident, specific, Windows. But the first packet of that connection, the SYN that opened the TCP handshake, carries an initial TTL of 64, a window scale of 7, and a TCP options layout that ends in a timestamp option. Windows does not do that. Windows starts its TTL at 128, and the desktop Windows stack does not put a timestamp option in its outbound SYN. The packet was built by a Linux kernel. The header said Windows. Somewhere between the browser and the server there is a box that lied about who it is, and the lie lives one layer below the one it was told at.

That contradiction is the whole story. The application layer is trivial to forge, because it is just a string the client chooses. The transport layer is not, because the client does not usually get to write it. The kernel does. When you route a browser through a datacenter proxy, the server stops seeing the browser’s TCP stack and starts seeing the proxy’s, and the proxy is almost always Linux. This post is about how anti-bot systems turn that gap into a detection signal, what exactly they read off the wire, how the modern fingerprint formats encode it, and the several honest reasons the check produces false positives often enough that nobody treats it as a hard block.

We start with why the two layers diverge and what fields carry the OS identity. Then the signature formats: p0f’s classic layout and the newer JA4T. Then the proxy-specific tells, including the MSS clamp that a VPN tunnel cannot hide and the cross-layer RTT misalignment from a 2025 NDSS paper. Then the part most write-ups skip: why this signal is noisy, where it false-positives, and why it lands as one weighted input rather than a verdict.

Why the transport layer tells the truth

The OSI layers were never meant to be a security boundary, but they accidentally became one for this purpose. An HTTP request is text your software writes from scratch on every request. The User-Agent, the Accept headers, the cookie jar: all of it is yours to set. A scraper that wants to look like Chrome on Windows just copies Chrome’s header set verbatim, and at the HTTP layer it is indistinguishable. That is exactly why the Accept-header triad and header order and casing became fingerprints in the first place: when everyone can set the values, the subtle ways libraries differ in how they set them become the tell.

The TCP/IP layer is different in kind. When your application calls connect(), it does not get to fill in the SYN packet. The operating system kernel does that. The kernel picks the initial TTL, the initial receive window, the window scale shift, whether to include a timestamp or selective-ACK option, and the order those options appear in. These are implementation choices the TCP and IP RFCs leave open, and every OS family made them slightly differently and then froze them for backward compatibility. The result is that the SYN packet is a reasonably stable signature of the kernel that emitted it, and an ordinary userland program, a browser included, cannot rewrite those fields without root and a custom raw-socket stack.

So you have a forgeable layer sitting on top of a much-harder-to-forge layer. A consistent client, a real Chrome on a real Windows laptop, agrees with itself across the two. The User-Agent says Windows and the SYN says Windows. A client that is lying at the top, a Linux scraper claiming to be Windows, or a Windows browser whose packets are being regenerated by a Linux proxy, disagrees with itself. The disagreement is the signal. As the researcher behind the incolumitas TCP/IP fingerprint API put it, if the TCP/IP fingerprint operating system is different than the claimed User-Agent operating system, there must be something wrong.

*The string at the top is whatever the client types. The bytes at the bottom are whatever the kernel emits. Detection lives in the gap between them.*

The fields that carry OS identity

Strip a SYN packet down and a handful of fields do almost all the discriminating work. Wikipedia’s summary of TCP/IP stack fingerprinting lists the variable pieces: initial packet size, initial TTL, window size, maximum segment size, window scaling value, and a few flag-level quirks like the don’t-fragment bit and the presence of sackOK and nop options. It notes that just inspecting the initial TTL and window size fields is often enough to identify an operating system, and that the combined values form roughly a 67-bit signature.

The single most useful field is the initial TTL. The IP time-to-live counts down by one at every router hop, so by the time a packet reaches the server it has already been decremented. But operating systems start it from a small set of round numbers. Almost everything modern starts at 64, 128, or 255. Linux and the BSDs, including macOS and iOS under the hood, start at 64. Windows starts at 128. Because a packet realistically does not cross more than a few dozen hops, an observed TTL of, say, 52 implies an initial value of 64 (Linux/Unix family) and a path length of 12 hops, while an observed 117 implies an initial 128 (Windows) and 11 hops. p0f assumes a packet will not jump more than 35 hops and checks whether the observed TTL falls in the range between the initial value and the initial value minus 35. So a Windows User-Agent riding on a packet that resolves to an initial TTL of 64 is already suspicious on that one field alone.

The window size and window scale add more. The TCP receive window in the SYN, and the window-scale shift count carried as a TCP option, are set by the kernel from its buffer-tuning defaults. The incolumitas write-up gives a concrete pair: an Ubuntu 18.04 host showed window size 29200 with window scale 7, while an Android 9 host showed window size 65535 with window scale 9. Same Linux lineage, different defaults, and both distinguishable from a Windows 10 SYN, which historically advertises window size 65535 with scale 8.

The richest field is the TCP options layout, the ordered list of which options appear and in what sequence. The incolumitas API documentation calls TCP options the most important source of entropy to distinguish operating systems, precisely because the data is variable in size and ordering and every stack lays it out its own way. Two facts from the JA4T work make the discrimination vivid. Microsoft Windows does not put a TCP timestamp option (option kind 8) in its SYN, while all the Unix-derived stacks do. And iOS uniquely terminates its option list with an end-of-list option (kind 0) rather than padding with no-ops the way others do. So the option layout alone separates Windows from the Unix family, and within the Unix family it separates iOS from the rest.

*Representative SYN defaults. Exact values shift across kernel versions and tuning, which is why a real classifier scores a weighted match rather than demanding an exact string.*

For the deeper treatment of each field, the TCP/IP stack fingerprinting and TCP timestamp and window-scaling posts go field by field; here the point is just that the SYN carries enough independent OS signal to call a Windows-vs-Linux question with decent confidence.

Two ways to write the signature down: p0f and JA4T

There are two well-known formats for serializing a TCP fingerprint, separated by about two decades, and both are worth knowing because anti-bot pipelines reference one or the other.

The older one is p0f, written by Michal Zalewski around 2000 and rewritten for version 2 and again for version 3. p0f is passive: it never sends a probe, it only watches the SYN, SYN+ACK, and RST packets that ordinary traffic already produces, which is exactly the property you want at an edge that is serving real users. Its TCP signature format is a colon-delimited string:

1
sig = ver:ittl:olen:mss:wsize,scale:olayout:quirks:pclass

Reading left to right: ver is 4, 6, or * for the IP version; ittl is the inferred initial TTL; olen is the IP options length; mss is the maximum segment size or * if it should be wildcarded; wsize,scale is the window size (possibly expressed relative to MSS, like mss*20) and the window scale shift; olayout is the ordered option list using shorthand tokens (mss, nop, ws, sok, ts, eol+N); quirks captures header anomalies; and pclass flags whether the payload is zero-length. A Linux 4.x SYN renders as something close to 4:64:0:*:mss*20,7:mss,sok,ts,nop,ws:df,id+:0, and a Windows SYN as something like 4:128:0:*:65535,8:mss,nop,ws,nop,nop,sok:df,id+:0. You can read the divergence straight off the strings: TTL 64 against 128, the presence of ts in the Linux layout and its absence in the Windows one, the different option ordering.

The quirks field is the part that rewards attention. It enumerates oddities like the don’t-fragment bit (df), a non-zero IP ID on a DF packet (id+), ECN support (ecn), a non-zero ACK number in a SYN, a non-zero urgent pointer, and several timestamp anomalies. These are the fiddly, version-specific behaviors that an evasion tool tends to forget, which is why Zalewski’s own documentation notes that the fingerprinting can be gambled with some minor effort, but that the combination of option ordering, window-size relationships, and timing is hard to forge systematically across many connections. One field is easy to spoof. The whole vector, consistently, packet after packet, is not.

The newer format is JA4T, published by John Althouse and FoxIO in April 2024 as part of the JA4+ suite (covered in full in the JA4+ suite post). JA4T deliberately throws away the OS-labeling database and just emits the raw discriminating fields in a fixed order:

1
window_size _ TCP-options _ MSS _ window-scale

An example from the FoxIO write-up is 29200_2-4-8-1-3_1424_7: window size 29200, the option-kind list 2-4-8-1-3 (MSS, SACK-permitted, timestamp, no-op, window-scale), MSS 1424, window scale 7. Because option kind 8 is the timestamp, a JA4T containing an 8 is Unix-family and one without it leans Windows, the same Windows-has-no-timestamp fact, encoded as a bare integer list. JA4T’s design choice is that it does not try to name the OS for you. It gives you a stable string, and you decide what a given string means in your environment. FoxIO’s framing is that JA4T can run on any device that sees the SYN, a firewall, a WAF, a load balancer, a proxy, and that beyond OS identification it helps flag intermediary proxies, VPNs, and tunnels. The companion JA4TScan tool does the active server-side variant.

*Two encodings of the same underlying SYN. The accented bits, TTL 64 and the timestamp option, are the Windows-vs-Unix discriminators that drive most OS-mismatch calls.*

How the mismatch call is actually scored

No serious classifier demands a byte-exact match against a stored signature, because real stacks drift with kernel versions, MTU, and middlebox meddling. They score. The open-source zardaxt tool, the successor to the incolumitas API, takes the incoming SYN, pulls the IP and TCP fields, and compares them against a labeled database, accumulating points per field. The weighting is deliberately lopsided toward the high-entropy fields: in the documented scheme, a TCP-options-layout match is worth the most (on the order of three points), MSS and window size each contribute around 1.5, and the remaining fields contribute up to a point each. The OS with the highest total wins, and the tool reports a probability distribution across OS families rather than a single hard label. The public BrowserLeaks TCP page, which runs zardaxt under the hood, shows exactly that shape of output: an iOS/macOS reading at 59 percent, Android at 21, Linux at 15, Windows at 7, alongside the raw TTL, window size, MSS, MTU, option list, hop distance, and an estimated RTT.

The mismatch check then sits on top. The system has two independent OS estimates: one from the SYN score, one parsed from the User-Agent string (and these days cross-checked against the Client Hints, the high-entropy Sec-CH-UA-Platform value). If the SYN says Unix-family with high confidence and the User-Agent says Windows, that is the os_mismatch flag zardaxt exposes as a boolean. The same logic generalizes past the OS name. The incolumitas BotOrNot demo frames it cleanly: when you configure a browser fingerprint to look like an iPhone but the SYN signature looks like Linux, there are only two plausible explanations, a VPN or proxy in the path, or a client lying about its configuration.

That two-explanations point is the honest core of the whole technique and we will come back to it, because a VPN user is a paying customer and a scraper is not, and the SYN alone cannot tell them apart.

The same cross-layer idea extends one rung up the stack into TLS. The OS implied by the TLS ClientHello and its JA3/JA4 fingerprint can be checked against the User-Agent the same way, and many advanced bots run on Linux while claiming macOS or Windows, so a TLS-induced OS that disagrees with the advertised OS is the identical contradiction one layer higher. A consistency engine that wants to be hard to beat checks all three at once: the SYN-derived OS, the TLS-derived OS, and the header-derived OS must agree. Forging one is cheap. Forging all three coherently, on the same connection, is the hard problem that breaks most stealth setups, which is the same lesson the why stealth plugins lose post draws from the browser side.

What a proxy specifically does to the packet

A forward proxy or VPN does not relay your SYN. It terminates your TCP connection on its own kernel and opens a fresh TCP connection to the origin, and that fresh connection’s SYN is built entirely by the proxy’s operating system. So the origin server sees the proxy’s TTL, the proxy’s window defaults, the proxy’s option layout. As one summary of the 2025 detection state put it bluntly, most proxies fail at the TCP/IP layer because the proxy server hardware runs Linux and the server’s kernel generates the packets, stamping them with Linux characteristics. Your beautiful Windows-Chrome HTTP layer rides on a Linux SYN, and the mismatch is born.

There is a second, subtler tell that survives even when the proxy operator tries to spoof OS fields: the MSS clamp. Maximum segment size is derived from the link MTU. A normal Ethernet path yields an MSS around 1460. The moment traffic is encapsulated in a tunnel, the encapsulation overhead shrinks the usable payload, and a correctly configured tunnel advertises a smaller MSS to compensate. FoxIO’s investigation of consumer VPNs makes this concrete. Surfshark’s TCP proxy presented a JA4T of 65170_2-4-8-1-3_1330_10, an MSS of 1330, correctly clamped down from 1460 to account for tunnel overhead. That clamped MSS is itself a signature of encapsulation. A direct connection from a desktop does not usually advertise 1330. Equally telling is the failure mode: NordVPN’s proxy advertised 65535_2-4-8-1-3_1460_9, an MSS of 1460 that did not match the 1380 its own clients accepted, and the resulting fragmentation made one test page load take 962 packets through the proxy versus 381 direct, roughly 2.5x the bytes. The point for detection is that whether the tunnel clamps correctly or incorrectly, the MSS is anomalous either way, and MSS is one of the highest-weighted fields in the scorer.

*The origin's view stops at the proxy. Everything below the application payload is the proxy kernel's, including the clamped MSS that betrays the tunnel.*

This is also why the proxy type matters, and why the residential vs datacenter vs mobile distinction shows up at this layer too. A datacenter Linux proxy in front of a claimed-Windows browser is the clean mismatch. A residential or mobile proxy that runs on, or terminates very close to, a real device can present a SYN that is more plausible for the claimed OS, which is part of what residential pools sell. The OS-mismatch check is one of the cheaper reasons datacenter IPs get caught even before the ASN reputation lookup runs.

The cross-layer RTT angle, and why it is harder to dodge

The OS-mismatch check has a known weakness: spoof the OS fields well enough and the SYN can be made to look consistent with the User-Agent. A 2025 NDSS paper from a University of Michigan group (Diwen Xue, Robert Stanley, Piyush Kumar, and Roya Ensafi) attacks the proxy from an angle that does not depend on the OS labels at all. Their observation is that a proxy terminates the transport-layer session locally but relays the application-layer session end to end. So the round-trip time measured at the TCP layer reflects the distance to the proxy, while the round-trip time measured at the application layer reflects the distance to the true endpoint behind it. Through a direct connection those two RTTs align. Through a proxy they do not. The paper calls this the misalignment of transport- and application-layer sessions, and the gap between the two RTTs is the fingerprint.

What makes the result notable is its independence from content. The fingerprint is protocol-agnostic and survives padding and traffic shaping, because it is a structural property of how proxying splits a session, not a property of any byte pattern the proxy could rewrite. Tested against obfuscated proxy protocols, including Shadowsocks and VMess, on both controlled testbeds and real ISP traffic, the similarity-based classifier reported around 95 percent accuracy using on the order of 20 to 40 probes. That is a meaningfully different threat than the static SYN check. You can edit your TTL; you cannot edit the speed of light, and the extra hop a proxy inserts shows up as latency the application layer cannot account for. The single-RTT version of this idea, comparing observed latency against where the IP geolocates, is the subject of the geolocation-vs-latency post; the NDSS work is the same instinct made rigorous and cross-layer.

Where the check breaks, and why it is never a hard block

Now the part the marketing pages skip. OS-mismatch detection is noisy, and a vendor that hard-blocks on it ships a product that locks out its customer’s real users. Several entirely legitimate situations produce the exact same mismatch a proxy does.

The first is the corporate or carrier middlebox. NATs, transparent proxies, and load balancers rewrite TCP characteristics in the normal course of business. MSS clamping in particular is standard practice on PPPoE and tunnel links to avoid fragmentation, so a clamped MSS by itself proves nothing about intent. A real Windows laptop behind an enterprise firewall that re-originates connections can present a SYN that no longer matches Windows, with no proxy and no bot anywhere in the picture. This is the same false-positive surface the MTU and path-MTU post explores from the packet-size side.

The second is the legitimate VPN, which is the explanation the BotOrNot demo lists first. Tens of millions of people run a commercial VPN for ordinary privacy reasons, and their traffic exhibits precisely the clamped-MSS, Linux-SYN, OS-mismatch profile that a scraper does. Treating the mismatch as guilt would flag every privacy-conscious user on the internet. Vendors know this, which is why the signal feeds a score rather than a gate.

The third is the genuine difficulty of OS inference. Defaults shift across kernel versions, mobile carriers do their own MSS and window manipulation, and IPv6 strips some of the fields p0f relies on. The zardaxt documentation itself states plainly that TCP/IP fingerprinting is no exact science, and that VPN traffic often cannot be cleanly identified because the VPN server does not establish a dedicated TCP/IP connection per client in the way the model assumes. The probabilistic output, iOS 59 percent and not iOS 100 percent, is an admission that the best the SYN can do is a confidence, not a fact. A mismatch between a 59-percent guess and a User-Agent is weak evidence on its own.

So in a real pipeline the OS-mismatch signal is a weighted feature, not a verdict. It joins TLS and HTTP/2 fingerprints, IP reputation, behavioral telemetry, and challenge outcomes in a scoring model. A datacenter ASN plus a Linux SYN plus a Windows User-Agent plus a Python-shaped TLS hello is a stack of cheap independent signals that together justify a challenge. Any one of them alone, the OS mismatch included, justifies almost nothing, because each has a benign explanation. The mismatch earns its keep not by being decisive but by being nearly free to compute on the very first packet, before the connection has done anything else, and by being one of the few signals a client cannot fix from userland.

Closing: the contradiction you cannot apologize your way out of

The durable thing here is the asymmetry. Everything a client says about itself at the application layer is a claim it controls, and a sufficiently careful operator can make every claim consistent and correct. The kernel that writes the SYN is not part of that negotiation. It tells the truth about itself because it does not know it is being interrogated, and a userland process cannot override it without rebuilding the network stack from scratch, which is precisely what a few specialized impersonation libraries now do at the cost of running as root with a custom stack. For the ordinary case, a browser pushed through a rented Linux proxy, the contradiction is structural and the operator never gets to see the packet that betrayed them.

What has changed by 2026 is not the principle but the encoding and the reach. p0f’s twenty-year-old signature grammar said the same thing JA4T says today, that the SYN carries OS identity. The newer work has made the signal harder to dodge by moving off the spoofable header fields entirely and onto timing, where a proxy’s extra hop shows up as a cross-layer RTT gap that padding cannot hide. The OS-mismatch check remains a probability, not a proof, and the honest vendors score it as one. But it is a probability computed from the first packet of the connection, off a field the client did not get to write, and that is a rare enough position in this business to be worth the noise.

Sources & further reading

Nikolai Tschacher / incolumitas (2021), TCP/IP Fingerprinting for VPN and Proxy Detection — the foundational write-up on comparing SYN-inferred OS against the User-Agent, with concrete Ubuntu/Android field values and the scoring weights.
Nikolai Tschacher / incolumitas (2022), TCP/IP Fingerprint API — documents the IP/TCP fields scored and names TCP options as the most important entropy source.
NikolaiT (2022 onward), zardaxt: Passive TCP/IP Fingerprinting Tool — the open-source classifier behind the API, exposing the os_mismatch boolean and the per-field point weights.
Michal Zalewski / lcamtuf (p0f v3), p0f README and signature spec — the canonical ver:ittl:olen:mss:wsize,scale:olayout:quirks:pclass format, the quirks list, and the masquerade-resistance discussion.
John Althouse / FoxIO (2024), JA4T: TCP Fingerprinting — the modern window_size_options_MSS_scale format, the Windows-no-timestamp and iOS-EOL facts, and proxy/VPN identification via MSS.
John Althouse / FoxIO (2024), Investigating Surfshark and NordVPN with JA4T — measured JA4T values showing correct vs misconfigured MSS clamping and the 962-vs-381-packet cost.
Diwen Xue, Robert Stanley, Piyush Kumar, Roya Ensafi / University of Michigan (NDSS 2025), The Discriminative Power of Cross-layer RTTs in Fingerprinting Proxy Traffic — the transport-vs-application RTT misalignment fingerprint, protocol-agnostic and resistant to padding.
Gilberto Bertin / Cloudflare (2016), Introducing the p0f BPF compiler — how an edge provider compiles p0f signatures to BPF to rate-limit traffic by inferred OS during attacks.
Wikipedia, TCP/IP stack fingerprinting — the variable header fields, the ~67-bit signature, and the active/passive tool taxonomy (Nmap, p0f, Xprobe, Satori).
Wikipedia, p0f — history, authorship, and the SYN/SYN+ACK/RST packet modes p0f inspects.
BrowserLeaks, TCP/IP Passive Fingerprinting — a live demonstration showing the probabilistic OS distribution, raw TTL/window/MSS/MTU, hop distance, and estimated RTT for the current connection.