Skip to content

TCP/IP stack fingerprinting: TTL, window size, and MSS as OS identity

· 22 min read
Copyright: MIT
The label TTL=64 as a large monospace wordmark with a single orange underline and a small grey SYN subtitle

Before a browser sends a single byte of HTTP, before the TLS ClientHello, before anything you can edit in code, the operating system has already announced itself. The first packet of any TCP connection is a SYN, and that SYN carries a handful of fields the application never chose. The initial time-to-live. The advertised window. The maximum segment size. A list of TCP options in an order the kernel fixed years ago and never bothered to expose as a setting. None of these say “I am Windows” in plain language, but together they say it clearly enough that a passive observer can read the operating system off the wire with no cooperation from the host and no extra packets sent.

That is the uncomfortable part for anyone trying to disguise a client. You can set the User-Agent to whatever string you like. You can spoof the TLS fingerprint with a library that mimics Chrome down to the cipher order. But the SYN was built by the kernel of the machine you are actually running on, and the kernel does not consult your User-Agent before it fills in the TTL. If the header says Windows and the SYN says Linux, the two halves of the same connection disagree, and that disagreement is one of the oldest and most stubborn signals in bot detection. This post is about the fields that carry the signal, where their default values come from, how the classic and modern tools turn them into a label, and why the transport layer is so much harder to forge than the layers above it.

The sections below run roughly bottom to top. First the IP header field that does the most work, the initial TTL, and the arithmetic of hop distance. Then the TCP window size and the scaling option that complicates it. Then the MSS, and why a value a few bytes below 1460 betrays a tunnel. Then the single most distinctive field of all, the ordering of TCP options, and how p0f and JA4T encode the whole SYN into a signature. A closing section on the cross-layer mismatch that catches proxies, and what the transport layer can and cannot tell a defender in 2026.

The field that survives the trip: initial TTL

The time-to-live is an eight-bit field in the IPv4 header, hop limit in IPv6, and its job is mundane. It stops packets from circling a routing loop forever. Every router that forwards the packet decrements it by one, and when it hits zero the packet is dropped and an ICMP time-exceeded message goes back to the sender. That is the whole mechanism, and traceroute is built entirely out of abusing it.

What makes TTL useful for fingerprinting is that the initial value is not specified by the protocol. RFC 1700 recommended 64 as the default, and that recommendation was widely ignored. Operating systems picked their own starting points and have mostly kept them for decades. Linux and the BSDs and macOS start at 64. Windows starts at 128. Network gear like Cisco IOS routers starts at 255. Those three values, 64, 128, and 255, cover almost everything you will see on the open internet, which is convenient because they are spaced far enough apart that a few router hops cannot blur one into another.

The blurring is the catch. By the time a SYN reaches a server, its TTL has already been decremented once per hop along the path. A packet that left a Windows box at 128 might arrive at 117 after eleven hops. So the observer does not read the OS off the raw TTL. It reads the nearest initial value above the observed one. An arriving TTL of 117 was almost certainly 128 to start with, eleven hops back. An arriving 54 was almost certainly 64, ten hops back. This is the reasoning p0f encodes when it writes an initial TTL as 54+10, meaning observed 54, distance 10, inferred initial 64. When p0f cannot be sure of the distance it writes nnn+? and tells you to confirm with traceroute.

A SYN loses one TTL per hop. The observer reads the nearest start above it. Windows host TTL=128 11 hops, -1 each server sees TTL=117 117 < 128, distance 11 → inferred initial TTL = 128 (Windows) p0f writes this as ittl = 117+11 64 → Linux/BSD/macOS · 128 → Windows · 255 → routers, some Unix *How an observed TTL maps back to an initial value: read the nearest standard start above the arriving number, and the gap is the hop distance.*

TTL alone is weak. It splits the world into three buckets and nothing finer, and a router or NAT box in the path can rewrite it. But it is cheap, it survives the entire trip to the server unencrypted, and it is the first thing every passive fingerprinter looks at. The classic Netresec write-up on passive OS fingerprinting reduces the whole technique, in its simplest form, to two fields read off the first packet: the initial TTL and the TCP window size. Get both and you have already narrowed the OS family considerably.

The window size and the scaling trick

The TCP window is the receiver’s flow-control promise: how many bytes it is willing to accept before the sender must stop and wait for an acknowledgment. The field in the TCP header is sixteen bits, so the raw value tops out at 65,535 bytes. On the SYN packet that opens a connection, this is the initial receive window the host advertises, and like the TTL it is a value the OS picks rather than the application.

The defaults vary enough to be useful. Netresec’s table lists Linux 2.4 and 2.6 kernels advertising 5840, Google’s customized Linux a slightly different 5720, FreeBSD and Windows XP both at 65535, Windows 7 and Vista and Server 2008 at 8192, and a Cisco IOS 12.4 router at 4128. Those are old systems, but the principle holds on current ones: each stack has a characteristic starting window, often computed as a small multiple of the MSS. The Pydoll network-fingerprinting notes describe modern Linux kernels advertising an initial window of 29200, which is exactly twenty times an MSS of 1460, with newer 5.x and 6.x kernels sometimes using 64240, which is forty-four times 1460 minus a bit. Windows 10 and 11 sit up around 65535 with auto-tuning. macOS defaults to 65535 as well.

Sixteen bits was never going to be enough for a fast modern link, where you want megabytes in flight, not kilobytes. RFC 1323, later folded into RFC 7323, added the window scale option to fix this. It is a three-byte TCP option sent only on the SYN, and it carries a single shift count from 0 to 14. The real window is the sixteen-bit field shifted left by that many bits, so a scale of 7 multiplies the advertised window by 128, and a scale of 14 by 16,384. Because the option appears only in the SYN, the scale factor for each direction is locked in at connection setup and never renegotiated.

That scale factor is itself a fingerprint. It is derived from the receive-buffer configuration, so it tracks the OS and its tuning. Linux commonly advertises a scale of 7. Windows often uses 8. The incolumitas VPN-detection research caught a clean example of this: an Ubuntu 18.04 host advertising window scale 7 against an Android 9 device advertising scale 9, the two stacks distinguishable on that field alone even when their MSS matched. The p0f signature format treats an excessively large scale, anything above 14, as a quirk worth recording, because it is out of spec and only a handful of stacks emit it.

The window scale option (SYN only) shifts the 16-bit field left. advertised window (16 bits) 65535 « scale 7 = effective window 8,388,480 bytes scale is fixed at SYN time, locked per direction, never renegotiated Linux ≈ scale 7 Windows ≈ scale 8 Android (observed) ≈ scale 9 a scale > 14 is out of spec and p0f flags it as the quirk "exws" *The window scale option turns a 16-bit field into a multi-megabyte window, and the shift count it carries is itself OS-characteristic.*

The window deserves its reputation. Nmap’s OS-detection chapter notes that the sixteen-bit window value alone is effective because more than eighty distinct values are known to be emitted by at least one operating system. Combine the raw window, the scale factor, and the fact that the window is often a clean multiple of the MSS, and you have a field that carries far more identifying information than its sixteen bits suggest. The companion piece on TCP timestamp and window-scaling fingerprints goes deeper on the scaling and timestamp angle.

MSS, the MTU, and the tunnel tell

The maximum segment size is a TCP option, present on the SYN, that tells the other side the largest TCP payload it is willing to receive in a single segment. It exists to keep segments from being fragmented at the IP layer. A host computes its MSS from the MTU of the interface the connection will use, subtracting the IP and TCP header overhead. On standard Ethernet the MTU is 1500 bytes, and after taking off 20 bytes of IPv4 header and 20 bytes of TCP header you are left with 1460. So 1460 is the overwhelmingly common MSS on the open internet, and the FoxIO write-up on JA4T calls it exactly that, the most common MSS, based on an Ethernet MTU of 1500.

The interesting MSS values are the ones that are not 1460. When the path includes a tunnel, the tunnel’s own headers eat into the usable payload, and the effective MTU drops. A VPN, a PPPoE link, a mobile carrier’s infrastructure, all of these shrink the MTU and therefore the MSS. The result is a value a few bytes or a few dozen bytes below 1460. FoxIO points out that an MSS slightly under 1460, such as 1436, suggests a network element sitting in-line before the host. The incolumitas measurements show MSS values like 1412 and 1424 in the wild, both below the Ethernet baseline, both consistent with a tunnel or a carrier in the path. Older guidance on proxy detection notes that PPTP, L2TP, and IPsec IKE all lower the MTU, and that comparing the segment sizes in an intercepted connection against the standard MTU and MSS can reveal a tunnel.

There is a more aggressive version of this where a middlebox actively rewrites the MSS, called MSS clamping. A router terminating a tunnel will catch passing SYN packets and lower their MSS option to whatever fits inside the tunnel, so that no segment ever exceeds the path MTU and triggers fragmentation. This is common on consumer routers and VPN gateways. The visible effect at the server is the same: an MSS that is suspiciously below 1460, on a connection whose User-Agent claims a plain desktop on Ethernet. The exact clamp value depends on the tunnel; WireGuard, OpenVPN, and IPsec each carry different overhead, and a specific clamp number is a property of the deployment rather than a fixed constant, so a server reads the shape of the anomaly more than any single magic number.

This is also where p0f’s handling of the MSS gets clever. The window size in a signature is often expressed relative to the MSS, as a multiple like mss*20, precisely because so many stacks size their initial window as a function of the MSS. If a stack derives the MSS from the interface MTU, and the window from the MSS, then a single odd MTU ripples through both fields in a correlated way that is hard to fake piecemeal. The companion post on MTU and path MTU discovery follows that thread further.

The most distinctive field: TCP option order

Everything so far narrows the OS to a family. The field that often pins it down is the order of the TCP options on the SYN. RFC 793 lists the options a TCP implementation may include but does not mandate any particular ordering, and Nmap’s OS-detection chapter makes the consequence explicit: because no ordering is required, implementations come up with their own, and those orderings are stable per stack. The kernel emits its options in a fixed sequence, version after version, and almost never exposes that sequence as something an application can change.

The common options on a SYN are MSS (option kind 2), window scale (kind 3), SACK-permitted (kind 4), timestamp (kind 8), and the structural fillers NOP (kind 1) and end-of-options (kind 0). NOP exists mostly to pad options so they align on four-byte boundaries, and where a stack inserts its NOPs is itself part of the signature. The orderings differ cleanly across the major systems. The Pydoll notes give three sequences that match what p0f’s database has encoded for years:

Linux: MSS, SACK_PERM, TIMESTAMP, NOP, WSCALE
Windows: MSS, NOP, WSCALE, NOP, NOP, SACK_PERM
macOS: MSS, NOP, WSCALE, NOP, NOP, TIMESTAMP, SACK_PERM

Two things jump out. First, Windows in its default configuration does not send the timestamp option at all, while every Unix-derived stack does. FoxIO states this plainly: Microsoft Windows does not use TCP option 8, whereas all Unix-based operating systems do. That single presence-or-absence bit separates the Windows family from everything else before you even look at ordering. Second, the placement of the NOP padding and the relative position of SACK-permitted and the timestamp differ between Linux and macOS even though both are Unix, which is what lets a fingerprinter tell a Mac from a Linux box when the TTL and window alone would not.

iOS adds its own tell. FoxIO notes that iOS ends its option list with a TCP option 0, the explicit end-of-options-list marker, where other systems simply stop. Small structural quirks like a trailing EOL, an unusual NOP count, or a SACK-permitted that sits before rather than after the timestamp are exactly the kind of detail that no application-level disguise touches, because the application never built the options block in the first place.

Same options, different order. The order is fixed by the kernel. Linux Windows macOS MSS SACK TS NOP WS

MSS NOP WS NOP NOP SACK

MSS NOP WS NOP NOP TS SACK

Windows omits the timestamp (TS) option entirely in its default config. Linux and macOS both send TS but place SACK and the NOP padding differently. No application controls this block; the kernel builds it. *The TCP option ordering on the SYN, three stacks side by side. The presence of the timestamp and the placement of NOP padding and SACK-permitted distinguish the families.*

Folding the SYN into a signature: p0f and JA4T

Two tools formalize all of this into a string you can index. The older is p0f, Michal Zalewski’s passive fingerprinter, whose third major version ships a signature database keyed by SYN. A p0f TCP signature is eight colon-separated fields:

ver : ittl : olen : mss : wsize,scale : olayout : quirks : pclass

ver is the IP version. ittl is the inferred initial TTL, written with the distance offset, so 64+0 or 54+10. olen is the length of any IP options, normally zero. mss is the maximum segment size, which can be a literal or a wildcard * when it varies by link. wsize,scale is the advertised window and the scale factor, where the window can be a literal, a multiple like mss*20 or mtu*4, or a modulus like %8192. olayout is the comma-delimited option order using shorthand tokens: mss, ws, sok for SACK-permitted, sack, ts, nop, and eol+n for an end-of-list with n padding bytes. quirks is a list of header peculiarities. pclass is the payload class, zero or non-zero.

The quirks field is where p0f records the odd structural details that the simpler fields miss. The documented list includes df for the don’t-fragment bit set, id+ for DF set but a non-zero IP ID, id- for DF clear but a zero IP ID, ecn for explicit congestion notification support, seq- for a zero sequence number, ack+ and ack- for ACK-number-versus-flag contradictions, ts1- for a zero own-timestamp, ts2+ for a non-zero peer timestamp on the initial SYN, exws for an excessive window scale above 14, opt+ for trailing non-zero data in the options block, and bad for malformed options. Each quirk is a yes-or-no bit that a clean stack either has or does not, and a forged SYN that gets one of them wrong is easier to spot than one that merely gets the window slightly off.

The modern formalization is JA4T, published by John Althouse at FoxIO in April 2024 as the TCP member of the JA4+ suite. JA4T is deliberately compact, four fields joined by underscores: the window size, then the TCP options as a hyphenated list of their kind numbers, then the MSS value, then the window scale. The blog’s worked example is 29200_2-4-8-1-3_1424_7, which reads as a window of 29200, options in the order MSS-SACKperm-timestamp-NOP-windowscale by their kind numbers, an MSS of 1424, and a scale of 7. The whole SYN collapses into one short, indexable token. (Some third-party explainers reorder the fields or hash the option list; the FoxIO post is the canonical definition, and it keeps the option kinds in the clear rather than hashing them.)

A JA4T fingerprint: the whole SYN in one indexable token. 29200_2-4-8-1-3_1424_7 window size option kinds, in order MSS scale 2 = MSS 3 = window scale 4 = SACK permitted 8 = timestamp 1 = NOP (padding) 0 = end of option list FoxIO reports JA4T blocks over 80% of internet scan traffic on its own. *JA4T splits a SYN into window size, the ordered list of option kinds, the MSS, and the window scale, joined with underscores.*

FoxIO’s pitch for JA4T is operational. Because scanners and crawlers tend to use a small number of distinct TCP stacks, often the same library or the same handful of tuned Linux boxes, their SYNs collapse onto a small set of JA4T values. The FoxIO writeup claims JA4T alone can block over 80 percent of internet scan traffic, cross-referenced against threat-intelligence feeds like GreyNoise for connections probing ports such as SSH. There is a server-side counterpart, JA4TS, that fingerprints the SYN-ACK a server sends back, and an active scanner, JA4TScan, that probes a server and reads the response. The whole JA4+ family, JA4 for TLS, JA4H for HTTP, JA4T for TCP and the rest, is meant to be read together; the JA4+ suite overview covers the others.

The cross-layer mismatch: when the stack disagrees with the User-Agent

Here is the reason transport fingerprinting refuses to die. The fields it reads are set by the kernel of the machine actually emitting the packet, and they travel in the clear, outside anything TLS encrypts and anything the browser controls. The User-Agent is a string the application writes. The TLS ClientHello is bytes the TLS library assembles. The SYN is built by the operating system before any of that runs. So a detection system gets two independent claims about the client’s OS: one from the headers, which the client fully controls, and one from the transport, which the client mostly does not. When they disagree, the disagreement is the signal.

The textbook case is a scraper. A Python client running on a Linux server sends a SYN with TTL 64, a Linux window, no timestamp-omission, the Linux option order. Its HTTP request carries a User-Agent claiming Chrome on Windows 10. The application layer says Windows; the transport layer says Linux. No amount of header spoofing changes the SYN, because the SYN left the kernel before the application’s headers existed. The Pydoll documentation puts it bluntly: none of these values are controlled by the browser, they come from the kernel, and a Python scraper on Linux claiming Windows produces a mismatch. This is the exact check that detecting a proxy by OS mismatch is built on.

Proxies make it worse, not better, in a specific way. When a connection passes through a proxy that terminates TCP, the server’s TCP connection is with the proxy’s kernel, not the origin client’s. The SYN the server sees was minted by the proxy. So the server reads the proxy’s OS, the proxy’s TTL, the proxy’s option order, regardless of what the real client is. If you route a spoofed-Windows browser through a Linux VPS proxy, the server fingerprints the Linux VPS. The TTL even resets, because the proxy generates a fresh connection with its own initial TTL. The classic p0f signal of a proxy or NAT was a TTL that did not match the rest of the fingerprint, and that intuition still holds; the transport fingerprint follows the last kernel to touch the connection, which on a proxied path is the proxy, not you.

The defenses against this are imperfect and worth naming honestly. You can run a userspace TCP stack that builds its own SYNs, choosing the TTL and window and option order to match the OS you are impersonating, which moves the disguise down to the transport layer where it belongs. Tools that spoof TCP/IP characteristics at the kernel or netfilter level exist for exactly this reason. But matching one stack perfectly is hard. The quirks p0f records, the exact NOP padding, the window-as-MSS-multiple correlation, the timestamp behavior over the life of the connection, are many small constraints that all have to agree at once, and a tunnel or middlebox in the path can reintroduce an MSS anomaly you did not intend. This is the same lesson the p0f passive OS fingerprinting writeup draws: the transport layer is not unforgeable, but it is forgeable only by people who understand it well enough to rebuild a stack faithfully, and that is a much smaller population than the people who can change a User-Agent string.

What the transport layer can and cannot say in 2026

The honest scope of TCP/IP fingerprinting is narrow and deep. It does not identify a person, a browser build, or a session. At best it identifies an operating system family, sometimes a specific OS and a rough sense of its network tuning, from fields that the application never chose. The initial TTL splits the world into three buckets. The window and its scale add resolution. The MSS exposes tunnels. The option order, more than anything else, separates the families and often the specific systems within them. Read together and matched against a database like p0f’s or hashed into a token like JA4T, they place a connection in a small bucket of likely stacks before the first HTTP byte arrives.

What makes the signal durable is not its richness but its position. It sits below everything the application controls and outside everything TLS encrypts, on a packet the kernel built without consulting the code running on top of it. Encrypted Client Hello can hide the TLS server name. HTTP/2 changed what the request layer reveals. None of that touches the SYN. The transport fingerprint is the one part of the connection that an application-level disguise cannot reach, which is precisely why a Linux box pretending to be Windows gives itself away in the first packet it sends, long before it has a chance to lie about anything else.

The practical takeaway for anyone building or defeating this is the same observation from both sides of the table. The cheap disguises operate at the layers a developer can see and edit, the headers and the TLS library, and those are exactly the layers a cross-layer check distrusts. The expensive, durable disguises operate at the layer the developer usually cannot see, the kernel’s TCP stack, and getting that layer to agree with all the layers above it, on every field and every quirk at once, is the actual work. A fingerprint that costs three bytes to read and a rebuilt network stack to forge is a good trade for the defender, and that asymmetry has held for more than twenty years.


Sources & further reading

Further reading