p0f and passive OS fingerprinting: how the network layer gives you away

Open a TCP connection to a server and you have already told it which operating system you are running, before you send a request, before TLS, before a single byte of application data. The SYN packet that starts the handshake is built by your kernel, not your application, and kernels disagree with each other in small, durable ways. The initial TTL. The advertised window. Which TCP options appear, and in what order. None of it is secret, none of it is encrypted, and none of it is something a normal application can change. A passive observer reads it off the wire and files you under Linux, or Windows, or macOS, with a fair guess at the version.

The tool that made this a discipline is p0f. It does not send a probe. It does not touch your machine. It sits on a link, watches packets that were going to flow anyway, and matches them against a database of stack quirks. This piece is about p0f specifically: where it came from, what it reads out of a SYN, how its signature grammar encodes a TCP/IP stack, and why the technique still works in 2026 even though the tool’s last real release predates most of the traffic it now sees.

The road map: first the distinction between passive and active fingerprinting, and why passivity is the whole point. Then the history, from a 2000 Bugtraq post to the v3 rewrite. Then the mechanics, field by field, of what a SYN reveals and how p0f’s signature format captures it. Then the parts beyond the SYN, TCP timestamps for uptime, the HTTP module, NAT and proxy detection. And finally the honest assessment of where the technique stands today, what erodes it, and what does not.

Passive versus active, and why it matters

There are two ways to learn what operating system a remote host runs from its network behavior. You can poke it, or you can listen.

Active fingerprinting pokes. Nmap is the canonical example: it crafts a battery of unusual packets, sends them at open and closed ports, and reads the responses. A SYN to a closed port, a packet with strange flag combinations, a probe with a deliberately odd window. Stacks respond to malformed or edge-case input differently, and those differences are diagnostic. Active probing is precise, because the prober controls exactly what gets sent and can replay the experiment until the answer is unambiguous. The cgsecurity write-up on the subject notes that active methods give “plus de précision” when passive analysis is inconclusive, and that is the trade. The cost is that you generate traffic. Unusual traffic. Traffic that an IDS logs, that a firewall may drop, and that tells the target you are interested in it.

Passive fingerprinting listens. It never sends anything. It analyzes packets the target was going to transmit regardless: the SYN that opens a connection it wanted to make, the SYN+ACK with which it answers a connection you made. Because it adds nothing to the wire, it cannot be detected by the host being fingerprinted, and it works straight through packet-filtering firewalls and NAT. The original p0f announcement made exactly this claim: “packet filtering firewalls, network address translation and so on are transparent to this technique, so you’re able to obtain information about systems behind the firewall.” You learn about a host by reading mail it already sent.

That property is what makes passive fingerprinting interesting to defenders and to anyone running a service at scale. The host cannot tell it is being profiled, so it cannot adapt. A web server can fingerprint every client that connects to it, continuously, for free, as a byproduct of traffic it was already handling. Cloudflare does precisely this to triage SYN floods, which we will come back to. The reader who cares about being fingerprinted should understand that the network layer gives an answer the application layer never gets a vote on.

A 2000 Bugtraq post and the long road to v3

p0f is one of the older tools still in working use. Michal Zalewski, who signs his work lcamtuf, posted version 1.0 to the Bugtraq mailing list on 10 June 2000. The announcement described “p0f - passive OS fingerprinting utility” and laid out the core idea in a sentence that has aged well: the combination of “initial TTL, window size, maximum segment size, don’t fragment flag, sackOK option … nop option and window scaling option combined together gives unique, 63-bit signature for every system.” All the tool needed was “at least one SYN packet initializing TCP connection to your machine or network.” Twenty-six years later the field names in the modern signature database are recognizably the same.

The tool went through a long version 2 era. By July 2004 the current release was p0f 2.0.4, which added fingerprinting for more packet types beyond the plain SYN, including SYN+ACK and RST+ACK, plus heuristics for masquerading and IP sharing. The LWN write-up from that period credits Zalewski along with William Stearns and other contributors, and notes the LGPL 2.1 license that the project still carries. v2 was the workhorse for the better part of a decade, and a lot of the fingerprint databases floating around the internet, including ones shipped by CERT’s NetSA suite, descend from that lineage.

Version 3 is a clean rewrite, copyright 2012. It is the version in use today, distributed through the unofficial GitHub mirror and packaged by every major Linux distribution. v3 is more than a port. The README describes a tool that reasons about “IPv4 and IPv6 headers, TCP headers, the dynamics of the TCP handshake, and the contents of application-level payloads.” That last clause is the big addition. v3 grew an HTTP module, so it can fingerprint not just the kernel underneath a connection but the browser or server software riding on top of it. The signature grammar was redesigned around named fields and explicit quirks, which is the format we will dissect below.

*The three eras of p0f: the 2000 proof of concept, the long v2 workhorse period, and the 2012 v3 rewrite that added application-layer reasoning. The current release dates to 2012.*

What a SYN packet actually carries

To understand the fingerprint you have to understand the packet. A TCP SYN is the first message of the three-way handshake. It carries no application data, but it carries a surprising amount of the sender’s intent about how the connection will behave, and most of those parameters are set by the kernel’s network stack from compiled-in or sysctl-tuned defaults.

Start with the IP header. The Time To Live field is an eight-bit counter that each router decrements by one as the packet passes. Its job is loop prevention, but its initial value is a stack default, and stacks pick from a tiny set. Almost every operating system starts TTL at 64, 128, or 255. Linux and the BSDs (including macOS) use 64. Windows uses 128. Some network gear uses 255. Because you observe the packet after it has crossed some number of router hops, you see a decremented value, but you can recover the initial: round the observed TTL up to the next member of {64, 128, 255}. The cgsecurity article gives the textbook example, an observed TTL of 62 implies an initial of 64, with two hops in between. The difference between observed and initial is also a free estimate of network distance.

The Don’t Fragment bit lives in the same IP header. Most modern stacks set it, because they rely on path MTU discovery rather than fragmentation, but whether it is set is still a recorded signature bit, and combinations of DF with IP identification field behavior split stacks further.

Then the TCP header, where the real discrimination lives. Three things matter most.

The advertised window size. This is how many bytes the sender is willing to receive before it must get an acknowledgment, and it is deeply stack-dependent. Crucially, many stacks do not advertise a flat constant. They advertise the window as a multiple of the maximum segment size, or of the MTU. The relationship is the fingerprint, not just the number. A Linux kernel that advertises ten or twenty times its MSS produces a window that varies with the path’s MSS while the multiplier stays constant, and the multiplier is what identifies it.

The maximum segment size, carried as a TCP option, tells the peer the largest payload the sender wants in one segment. It is derived from the link MTU, so on a normal 1500-byte Ethernet path it lands at 1460, but it shifts on tunnels, PPPoE links, and anything that lowers the MTU. p0f records MSS partly to evaluate the window relationship and partly because the MSS itself, combined with the MTU it implies, points at the link type.

And the TCP options, both which ones appear and the order in which the kernel lays them out. A stack might emit MSS, then a SACK-permitted option, then a timestamp, then a NOP for alignment, then window scale. Another emits MSS, NOP, window scale, NOP, NOP, SACK-permitted, and no timestamp at all. The set and the ordering are not configurable through any normal interface; they are baked into the kernel’s TCP output path. That makes option layout one of the strongest discriminators in the whole packet.

*The signal surface of one SYN packet. p0f reads the IP-layer TTL, DF, and ID fields plus the TCP window, MSS, and option layout. The window-to-MSS relationship and the option ordering carry the most discriminating power.*

The signature grammar

p0f v3 encodes a TCP stack as a colon-separated string. Reading it is the fastest way to understand what the tool actually keys on. The format for a SYN is:

1
ver:ittl:olen:mss:wsize,scale:olayout:quirks:pclass

Each field maps to something on the wire. ver is the IP version: 4, 6, or * for both. ittl is the inferred initial TTL, recovered by rounding the observed value, written as the canonical 64, 128, or 255. olen is the length of IPv4 options or IPv6 extension headers, usually zero. mss is the maximum segment size from the TCP option, or * when it varies. wsize,scale is the advertised window and the window scaling factor, and the window is frequently written relative to MSS: mss*20 means twenty times the MSS, not a literal byte count. olayout is the comma-delimited list of TCP options in the exact order the stack emits them, using short tokens: mss, sok for SACK-permitted, ts for timestamp, nop for the one-byte padding, ws for window scale, eol+n for end-of-options followed by n padding bytes. quirks is a comma-separated list of stack oddities. pclass classifies the payload as 0 for empty, + for non-empty, or * for any; a normal SYN has no payload.

A label binds a signature to a human-readable identity. From the shipped database:

1
label = s:unix:Linux:3.11 and newer
2
sig   = *:64:0:*:mss*20,10:mss,sok,ts,nop,ws:df,id+:0

Read that left to right. Either IP version. Initial TTL 64, so a Unix-family stack. No IP options. Any MSS. Window is twenty times MSS with a scale factor of 10. Options in the order MSS, SACK-permitted, timestamp, NOP, window scale. Quirks: DF set, and the IP ID field is non-zero despite DF being set. Empty payload. That string is a Linux 3.11+ kernel’s TCP personality written down.

Compare Windows:

1
label = s:win:Windows:7 or 8
2
sig   = *:128:0:*:8192,0:mss,nop,nop,sok:df,id+:0

Initial TTL 128, the Windows tell. A flat window of 8192 with no scaling. Options in the order MSS, NOP, NOP, SACK-permitted, and notably no timestamp. And macOS:

1
label = s:unix:Mac OS X:10.x
2
sig   = *:64:0:*:65535,1:mss,nop,ws,nop,nop,ts,sok,eol+1:df,id+:0

TTL 64 like its BSD ancestry, a flat 65535 window with scale factor 1, and a long, distinctive option string ending in an explicit end-of-options marker with one byte of padding. Three operating systems, three signatures, all read from a packet that carries no application data and no encryption.

The OS label itself has structure. A v3 label carries a type, a class, a name, and a flavor. The class is the broad family: unix, win, cisco. The name is the specific OS, Linux or Windows. The flavor is the qualifier, the version range. This is why p0f can answer at different resolutions, “some Windows” when only the class matches, “Windows 7 or 8” when the full signature lines up.

The quirks field

The quirks list is where p0f captures the small illegal-ish behaviors that stacks exhibit, the things that are not parameters so much as tells. The README enumerates them. On the IP side: df for the don’t-fragment flag, id+ for DF set while the IP ID is still non-zero, id- for DF clear while the ID is zero, ecn for explicit congestion notification support, 0+ for a non-zero value in a field that the spec says must be zero, and flow for a non-zero IPv6 flow label. On the TCP side: seq- for a zero sequence number, ack+ for a non-zero acknowledgment number when the ACK flag is not set, ack- for the inverse, uptr+ for a non-zero urgent pointer with no URG flag, plus push and urgent flag oddities. On timestamps: ts1- for an own-timestamp of zero, ts2+ for a non-zero peer timestamp on a SYN, which should not happen. And the catch-alls: opt+ for trailing non-zero data after the options, exws for an excessive window-scale value above 14, and bad for malformed options the parser could not make sense of.

Most of these quirks describe behavior no application can produce and no normal user would ever notice. They exist because TCP stacks were written by different people at different times against a spec with corners, and the corners got handled differently. That is precisely what makes them durable identifiers. A field that “must be zero” but is not tells you something specific about the code path that emitted the packet.

Beyond the SYN: timestamps, uptime, and the HTTP module

p0f reads more than the first packet. Two of its more interesting tricks come from looking at the connection over time and at the layers above TCP.

TCP timestamps, when present, let p0f estimate the remote host’s uptime. The timestamp option carries a value driven by a clock that ticks at a stack-specific frequency. By watching the timestamp advance across packets and knowing the tick rate, p0f can extrapolate backward to when the counter was zero, which is roughly when the stack started. The README notes the tool needs to observe “at least about 25 milliseconds worth of qualifying traffic” before it can lock onto the progression. The result is a free uptime readout for any host whose stack enables timestamps, which is most Linux and BSD systems by default. It is a striking demonstration of how much a passive observer can infer from a parameter that exists for an entirely unrelated reason, round-trip-time measurement and PAWS protection against wrapped sequence numbers.

The HTTP module is v3’s headline addition. It applies the same philosophy a layer up. Rather than parse what a request says, it looks at how the request is structured, on the theory that structure is harder to fake than content. An HTTP signature has the form:

1
ver:horder:habsent:expsw

ver is the HTTP version, 0 for 1.0 or 1 for 1.1. horder is the ordered list of headers, with optional name=[value] substring matching on specific header values. habsent lists headers that must not appear. expsw is an expected substring in the User-Agent or Server header, used to catch software that lies about itself. The insight is that a browser sends its headers in a characteristic order and includes or omits a characteristic set, and that ordering is a property of the HTTP client implementation, not of the page being fetched. A client claiming to be one browser while ordering its headers like another has given itself away. This is the same logic that drives modern header order and casing fingerprints and the Accept-header triad signature, and p0f was doing it at the HTTP layer in 2012.

Catching a proxy: the OS-mismatch signal

The single most useful thing passive fingerprinting does in an anti-abuse context is catch a network whose layers disagree with each other. p0f has explicit machinery for this.

When p0f sees a host’s signature change in a way that looks systematic rather than random, it flags it. The README lists the reason codes it attaches: os_sig when the OS signature itself changes, sig_diff for protocol-level changes, tstamp for inconsistent timestamps, ttl for a TTL change, port for a source-port decrease that should not happen, and mtu for an MTU shift. A run of these from one apparent host is the fingerprint of NAT, a proxy, or address sharing, multiple real machines hiding behind one IP, each with its own stack personality.

The sharper version of this is the cross-layer mismatch, and it is the reason network fingerprinting still matters for bot detection in 2026. Consider a request that flows through a proxy. The proxy’s kernel terminates your TCP connection and opens a fresh one to the target. The target therefore sees the proxy’s SYN, with the proxy’s TTL, window, and option layout, not yours. So the OS that the TCP stack implies is the proxy’s OS. Now suppose the HTTP request riding inside claims, via its User-Agent, to be a Windows browser, while the proxy that emitted the SYN runs Linux. The TCP fingerprint says TTL 64, Linux. The User-Agent says Windows. Those cannot both be true for one machine. The contradiction is the detection. As the pydoll write-up on network fingerprinting puts it plainly, “the User-Agent says Windows (TTL 128) but the TCP fingerprint shows Linux (TTL 64)” is the tell that exposes a proxy or a spoofed agent string.

*The cross-layer mismatch. The proxy terminates TCP and opens its own connection, so the target reads the proxy's stack at the network layer while the User-Agent still claims the client's OS. When the two disagree, the network layer wins the argument.*

This is why TCP/IP fingerprinting sits underneath the modern anti-bot stack rather than being replaced by it. A scraper can spoof its User-Agent perfectly. It can mimic a browser’s TLS ClientHello with uTLS. It can fake header order. But the SYN that opened the connection came out of whatever kernel actually sent it, and faking that requires control of the network stack itself, not just the application. The mismatch detection generalizes far past p0f’s own database, and it connects directly to the broader problem of detecting a proxy by OS mismatch.

How a fingerprint match actually happens

The mechanics of matching are simpler than the database makes them look. p0f extracts the fields from an observed packet, builds the signature string, and looks for the best match against its loaded fingerprints. A typical install loads on the order of 320 SYN signatures from the p0f.fp file. Matching is not pure equality; wildcards in the database (* for IP version, MSS, payload class) mean a single signature can cover a range of real packets, and the window can be expressed relative to MSS so it matches across paths with different MTUs.

When several signatures could match, p0f resolves at the coarsest level it can be confident about. A packet whose TTL is 64 and whose option layout matches no exact entry might still resolve to “generic Linux” on the strength of the TTL and the broad option shape. The label structure, with its class/name/flavor fields, exists exactly so the tool can give a useful partial answer instead of a useless null. This is also why the database ages gracefully in one direction and badly in another: an unknown new Linux still looks like Linux at the class level even when no flavor matches, but a genuinely novel stack with an unusual option layout can fall through to “unknown” entirely.

The database is the soft underbelly. p0f’s v3 fingerprint set is the 2012-era one, give or take community patches. Operating systems shipped since then have stacks that the original database never saw. The fields p0f reads have not changed, TTL is still TTL, window is still window, but the specific value combinations that modern kernels emit may not have a labeled entry. In practice this means a fresh p0f against 2026 traffic correctly identifies the OS family far more often than it nails the exact version, because the family-level tells (TTL 64 versus 128, timestamp present versus absent, the gross option ordering) are stable across many kernel releases while the fine details drift.

Cloudflare’s BPF compiler, or fingerprinting at line rate

A good illustration of p0f’s signatures outliving the p0f program is what Cloudflare did with them. In an August 2016 engineering post, Cloudflare described compiling p0f’s signature format into Berkeley Packet Filter bytecode. The motivation was SYN-flood defense: during an attack they want to “rate limit attack packets, and in effect prioritize processing of other, hopefully legitimate, ones.” Real operating systems produce SYNs that match known p0f signatures; many flood tools produce SYNs that do not, or that match the signature of a specific attack tool.

Rather than run the p0f daemon in the packet path, they took the human-readable signature grammar and built a compiler that emits BPF, so the classification runs in the kernel’s packet filter at line rate, inside iptables. The blog gives worked examples in p0f’s own format, a Linux SYN as 4:64:0:*:mss*10,6:mss,sok,ts,nop,ws:df,id+:0, a Windows 7 SYN as 4:128:0:*:8192,8:mss,nop,ws,nop,nop,sok:df,id+:0, and a hping3 attack packet that betrays its synthetic origin with a sparse signature and an ack+ quirk. The signature language outlived the tool that defined it, which is a fair measure of how good the abstraction was. The same field set that Cloudflare compiles to BPF is what feeds the network-layer side of vendor systems like DataDome’s HTTP/2 and network fingerprinting.

Where the technique stands in 2026

Passive OS fingerprinting from the SYN is older than most production TCP stacks running today, and it is not going away, but it has limits worth stating plainly.

What still works: the family-level signal. TTL of 64 versus 128 still cleanly separates the Unix-descended world from Windows, and no normal application can change it. Option ordering and the presence or absence of the timestamp option still split the major stacks. The window-to-MSS relationship still holds. Most importantly, the cross-layer mismatch check, network fingerprint disagreeing with the User-Agent, is more useful now than it was in 2012, because the modern internet is saturated with proxies, VPNs, and CGNAT, and detecting that someone is behind one is valuable in itself.

What erodes it: middleboxes and normalization. NATs, VPN concentrators, load balancers, and traffic normalizers rewrite TTL or rebuild TCP options, which either changes the observed signature or smears many real hosts into one. A host behind a corporate VPN may fingerprint as the VPN appliance. The rise of cloud egress means a huge share of traffic now originates from a handful of stack types on Linux hypervisors, which compresses the diversity the technique feeds on. And the v3 database’s age means version-level precision has decayed even where family-level identification holds.

What it never reached: the payload. The whole point of passivity is that p0f reads what is already on the wire, and increasingly what is on the wire is encrypted. p0f’s HTTP module sees nothing inside a TLS connection. That is why the center of gravity for application fingerprinting moved to the one plaintext handshake that remains, the TLS ClientHello, and to the structure of HTTP/2 framing, both of which leak in the clear even when the body does not. The network layer that p0f reads is one floor below all of that, and it stays readable precisely because TTL and window size and option order travel in the clear by necessity, not by choice.

The durable lesson of p0f is architectural, not about any one tool. Identity leaks at the layer that the application does not control. A program can lie about everything it writes, its User-Agent, its headers, its claimed OS. It cannot easily lie about the SYN, because the SYN belongs to the kernel. Twenty-six years after a Bugtraq post claimed a 63-bit signature for every system, the cheapest, most undetectable signal on the network is still the one the application never gets to touch.

Sources & further reading

Zalewski, M. (2000), p0f - passive os fingerprinting tool (Bugtraq announcement) — the original v1.0 release post laying out the 63-bit SYN signature idea and the firewall-transparency claim.
Zalewski, M. (2012), p0f v3 project page — the author’s own description of the v3 rewrite, its scope, and what it reads from IP/TCP/HTTP.
p0f project (2012), p0f v3 README — the authoritative reference for the signature grammar, the full quirks list, uptime estimation, and NAT reason codes.
p0f project, p0f.fp fingerprint database — the shipped signature file with the section structure and real labeled signatures for Linux, Windows, and Mac OS X.
p0f project, p0f unofficial git repository — the maintained source mirror, LGPL 2.1, including docs and the BPF helper material.
LWN.net (2004), p0f, the Passive OS Fingerprinter — contemporaneous coverage of the v2.0.4 era, authorship, and the passive-versus-active distinction.
Cloudflare (2016), Introducing the p0f BPF compiler — how Cloudflare compiles p0f signatures to BPF for SYN-flood triage, with worked example signatures.
CGSecurity, OS fingerprinting — background on passive versus active methods and how initial TTL is recovered from the observed value by rounding to 64/128/255.
Pydoll docs (2025), Network fingerprinting deep dive — modern Linux/Windows/macOS TCP signature values and the proxy OS-mismatch detection logic.
CERT NetSA, p0f fingerprint database snapshots — archived community fingerprint files showing how the database is versioned and distributed.