Skip to content

MTU, path MTU discovery, and the fingerprint hiding in packet sizes

· 20 min read
Copyright: MIT
The letters MSS rendered large in monospace with an orange underline, over a faint stack of packet-size brackets

A SYN packet carries a number that the client never meant as identification. It is the maximum segment size, four bytes of TCP option that say how large a payload this host is willing to receive. The host picked it for a dull, mechanical reason: subtract the IP and TCP headers from the MTU of the interface the packet left through, and announce what is left. On a plain Ethernet machine that arithmetic produces 1460. On a machine sitting behind a tunnel, it produces something else. The something-else is the tell.

That is the whole idea of packet-size fingerprinting, and it is older than most of the bot-detection industry that now uses it. The MSS in a SYN, the window size beside it, the path-MTU discovery behavior that follows, all of it is set by the network stack and the links underneath it, not by the application. A browser cannot rewrite its own MSS without rewriting the kernel’s idea of the interface MTU. A proxy that forwards your traffic stamps its own MTU on the segment, not yours. So when an MSS arrives that no ordinary Ethernet host would send, the value alone narrows down what kind of link the packet crossed, and sometimes which VPN software shaped it.

This post follows the number from the bottom up. First, where MTU comes from and how MSS is derived from it. Then the encapsulation overhead that makes tunneled MSS values land on distinctive plateaus, with the WireGuard and OpenVPN cases worked out in bytes. Then path-MTU discovery itself, the ICMP mechanism and its black-hole failure mode, and why QUIC had to reinvent the whole thing. Finally, how passive fingerprinters read all of this, what they can and cannot conclude, and where the signal goes quiet.

Where the MTU number comes from

The maximum transmission unit is the largest payload a link will carry in one frame, headers of that link excluded. Classic Ethernet fixes it at 1500 bytes. The figure is not a law of physics, it is a number the 1980s Ethernet committee chose, and almost the entire Internet inherited it as the default. Most paths you will ever touch are 1500-byte paths end to end, which is exactly why anything that is not 1500 stands out.

TCP does not advertise the MTU directly. It advertises the maximum segment size, which is the MTU minus the IP header and the TCP header. For IPv4 those are 20 bytes each in their minimal form, so the standard derivation is MSS equals MTU minus 40. A 1500-byte Ethernet MTU yields an MSS of 1460, and that single value, 1460, is the most common MSS on the Internet by a wide margin. The MSS option travels only in SYN and SYN-ACK packets. Each side states the largest segment it is willing to receive, the smaller of the two pair wins for that direction, and the value is never renegotiated for the life of the connection. It is a one-time declaration made at handshake, which is precisely what makes it convenient to read.

From link MTU to advertised MSS, the standard IPv4 Ethernet case MTU = 1500 bytes (Ethernet link payload) IP 20 TCP 20 MSS = 1460 bytes 1500 − 20 − 20 = 1460 the single most common MSS on the Internet *The derivation is fixed arithmetic. Anything that changes the interface MTU changes the MSS by the same amount, which is why the value carries information about the link.*

There is a second important value riding in the same SYN, and it is worth naming because fingerprinters read them together. The receive window, and the window scale option that multiplies it, are set by the OS and vary across operating systems in ways that are stable per stack. MSS tells you about the link. The window and its scale tell you more about the host. The TCP/IP-stack fingerprinting that p0f and its descendants perform leans on all of these at once, and the TCP/IP stack fingerprinting story is mostly about how those host values map to operating systems. Packet size is the part of that story owned by the link rather than the kernel, and it is the part a tunnel cannot hide.

The overhead that bends the number

A tunnel wraps your packet inside another packet. That outer wrapper costs bytes, and those bytes have to come from somewhere. Either the path fragments the oversized result, which is slow and often broken, or the tunnel software shrinks the inner packet so the wrapped whole still fits in 1500. The second choice is the sane one, and it is what every competent VPN does. Shrinking the inner packet means lowering the MTU the inner stack believes it has, which means lowering the MSS that inner stack advertises. The overhead lands directly in the SYN.

The exact amount depends on the protocol. WireGuard is the clean case because its wire format is fixed and documented. A WireGuard data message adds a 4-byte type-and-reserved field, a 4-byte receiver index, an 8-byte counter, and a 16-byte Poly1305 authentication tag, which is 32 bytes of WireGuard overhead. On top of that the outer UDP header is 8 bytes and the outer IP header is 20 bytes for IPv4 or 40 for IPv6. WireGuard’s default interface MTU is 1420, and the reasoning is explicit in the project’s own discussion: take 1500 and subtract the IPv6 worst case of 40 plus 8 plus the 4-plus-4-plus-8-plus-16 of WireGuard header bytes, and 1420 is what remains. A host that knows it is IPv4-only can run 1440 instead, because the outer IP header drops from 40 to 20.

WireGuard default MTU, the IPv6 worst case worked out in bytes outer IPv6 header 40 outer UDP header 8 WG type + reserved 4 WG receiver index 4 WG counter / nonce 8 Poly1305 auth tag 16 total overhead 80 1500 − 80 = 1420 interface MTU → MSS 1380 *WireGuard's 1420 default is the IPv6 case. An interface MTU of 1420 advertises an MSS of 1380, and a server seeing 1380 in a SYN is looking at a number no bare-metal Ethernet host produces.*

So a WireGuard client with the default 1420 MTU advertises an MSS of 1380, since 1420 minus 40 is 1380. That value is the fingerprint. No ordinary 1500-byte Ethernet host emits 1380, and the offset from 1460 is exactly the tunnel overhead. The same logic gives a small family of related values. An IPv4-only WireGuard at 1440 advertises 1400. WireGuard carrying IPv6 inner traffic shifts again. Each is a plateau a few dozen bytes below 1460, and each is uncommon enough in the wild that its mere presence is suggestive.

OpenVPN is messier and, for the fingerprinter, more rewarding. Rather than lower the interface MTU, OpenVPN’s mssfix feature rewrites the MSS inside the encapsulated TCP handshake to whatever the computed overhead allows. The overhead it computes depends on the cipher, the HMAC, the transport, and whether compression is on, so the resulting MSS encodes the configuration. ValdikSS’s measurements, the work behind the modified p0f variant that records MTU, list concrete values. A UDP tunnel with a 64-bit cipher and SHA1 and no compression lands on an MSS of 1369. The TCP version of the same lands on 1367. Move to a 128-bit cipher with SHA256 and the UDP case drops to 1341, the TCP case to 1339. These are not round numbers. They are arithmetic residues of a specific cryptographic setup, and that is what makes them precise rather than merely unusual. A server that observes 1369 is not guessing at “some VPN,” it is looking at a value that a particular OpenVPN profile produces.

The catch, and it is the honest part, is that these exact figures depend on OpenVPN versions and option defaults that have shifted over the years. The principle holds: the MSS is a function of the encapsulation, so it leaks the encapsulation. The specific table of numbers is version-bound, and anyone matching against it has to keep it current. The exact byte layout of every OpenVPN cipher-and-HMAC combination is not something a server can derive from first principles on the fly. It is inferred from observed traffic and from OpenVPN’s own overhead documentation, which is why detection tools ship a lookup table rather than a formula.

Reading the value at the server

A passive fingerprinter does not probe. It reads the first SYN of an incoming connection and pulls the IP and TCP header fields straight out of it. p0f, the canonical tool, has done this since the early 2000s, and the modern reimplementations follow the same recipe with newer signature databases. The fields that matter for packet size are the MSS option, the window size, the window scale, and the order in which the TCP options appear. That option order, MSS then SACK-permitted then timestamp then NOP then window-scale, is itself OS-specific, which is why a fingerprinter reads the whole SYN as a unit rather than the MSS alone.

What a passive fingerprinter reads from one incoming SYN MSS link signal SACK-perm timestamp NOP win-scale The option order itself is OS-specific. Read together with: window size OS default, varies per stack window scale shift count, OS default IP TTL initial 64 / 128 / 255, decremented per hop *MSS is the link-owned field; window size, scale, and TTL are host-owned. A fingerprinter scores them jointly because any single field is weak and the combination is strong.*

The MTU-aware p0f variant adds one trick worth describing because it shows how the matching is done. Instead of storing a raw MSS, its signature language lets a rule express the window size relative to the MSS or MTU, using forms like mss*4 for a window that is four times the segment size, or mtu*N for a window tied to the link MTU. The point is that many stacks set their initial window as a clean multiple of the MSS, so encoding the relationship rather than the absolute number makes a signature survive across links of different sizes. The same variant keeps a separate table that maps observed MSS values to link types, Ethernet at one plateau, PPPoE at another, the various tunnels at theirs, so that an MSS the SYN does not directly explain can still be classified by lookup.

PPPoE is the everyday example of a non-tunnel link that still shifts the number. PPPoE adds 8 bytes of its own header on top of Ethernet, which caps the negotiated MTU at 1492 rather than 1500, and yields an MSS of 1452. Millions of DSL subscribers sit behind exactly this, so 1452 is a common and entirely innocent value. RFC 4638, from 2007, lets PPPoE links negotiate a full 1500-byte MTU using so-called baby jumbo frames of 1508 bytes on the wire, but plenty of deployments still run the classic 1492. The lesson is that a below-1460 MSS is not proof of a VPN. It is proof of overhead somewhere on the first hop, and the fingerprinter’s job is to attribute that overhead to the right cause.

That attribution is where packet size stops being a standalone signal and becomes one input among several. The strongest use of MSS in proxy and bot detection is not “this MSS means VPN.” It is the contradiction check. A request arrives claiming, in its User-Agent, to be an iPhone on Safari. The TCP SYN underneath it advertises a window size, an option order, and a TTL that match a Linux server, and an MSS that matches a tunnel. iPhones do not have a Linux stack, and Safari does not run on a 1380-MSS tunnel interface by default. The packet contradicts the header. That mismatch is the same idea behind detecting a proxy by OS mismatch, and packet size is one of the cleaner contributing fields because the application layer has no easy way to reach down and rewrite the kernel’s MSS to match its lie. The broader passive-fingerprinting tradition that p0f established is built on exactly these network-layer fields giving the host away before a single byte of application data is sent.

Path-MTU discovery, and how it fails

The MSS handshake fixes the segment size for one direction at connection time. It does not account for a link somewhere in the middle of the path with a smaller MTU than either endpoint’s first hop. A packet that fit when it left can hit a 1400-byte link three hops away and be too big to forward. Path-MTU discovery is the mechanism that finds that smaller link and adapts to it, and its behavior is itself observable.

The classic method, RFC 1191 from 1990, is elegant on paper. Every outgoing packet sets the Don’t Fragment bit in its IP header. When a router needs to forward a packet larger than its next link’s MTU and cannot fragment it because DF is set, it drops the packet and sends back an ICMP message, type 3 code 4, “fragmentation needed and DF set.” That message carries the next-hop MTU in a field that the original ICMP format had left unused, so the sender learns the exact size it should drop to and retries. RFC 1191 even defined a table of common MTU plateaus, 65535 down through 1492, 1006, 576, and finally 68, so a router that did not report a specific next-hop MTU could still be matched to a sensible step. The sender ratchets its estimate down until packets get through.

Classic PMTUD, working and black-holed sender router, next link MTU 1400 1500, DF=1 ICMP 3/4 next-hop MTU 1400 retry 1400, DF=1 → delivered sender firewall 1500, DF=1 ICMP dropped sender never learns → connection hangs The black hole is silent: small packets pass, large ones vanish with no error. *The mechanism depends entirely on the ICMP message getting home. Filter that one message and the path turns into a black hole where small packets pass and large ones disappear without a trace.*

The elegance has a single fragile dependency. The whole scheme breaks if that ICMP message never reaches the sender, and ICMP gets filtered constantly. Administrators block it out of a vague sense that ICMP is dangerous, or a stateful firewall drops the message because it cannot match it to a known flow. The result is the PMTUD black hole. Small packets like the handshake itself get through, so the connection opens and looks healthy. Then the first full-size data segment hits the small link, gets dropped, and the ICMP that would explain the drop is swallowed by a firewall. The sender retransmits the same oversized packet, which is dropped again. The connection stalls with no error, the classic symptom of a page that loads its headers and then hangs forever. Anyone who has debugged a site that works from one network and silently fails from another has probably met this.

The black hole is itself a fingerprint of sorts, because the workarounds for it are observable. The common fix is MSS clamping. A router or firewall on the path rewrites the MSS in passing SYN packets down to a value that fits its known tunnel MTU, so the endpoints never advertise a segment larger than the path can carry and PMTUD is never needed. Linux exposes this as the --clamp-mss-to-pmtu target, and consumer routers behind PPPoE do it by default. Clamping is the reason a server sometimes sees an MSS that matches neither endpoint’s real interface, because a middlebox rewrote it. It is also why MSS, read in isolation, can mislead: the value you see may belong to the last clamping device on the path, not to the originating host. A clamped 1452 from a PPPoE router and a native 1452 from a host configured that way are indistinguishable in the SYN.

Because RFC 1191 is so easy to break, the IETF replaced its dependency on ICMP. RFC 4821, from 2007, defined packetization-layer PMTUD, which lets TCP itself find the path MTU by probing. It sends progressively larger packets and watches which ones are acknowledged, running a search that raises a lower bound on success and lowers an upper bound on failure until it converges on the real limit. No ICMP required. The transport layer’s own loss signal does the work. This hardens a host against black holes, and it also changes what a passive observer sees, because a stack doing PLPMTUD emits a recognizable pattern of growing probe sizes rather than a single fixed segment.

QUIC, and why it had to start over

QUIC made the whole question sharper, because it runs over UDP and UDP has no segmentation of its own. A QUIC sender chooses its datagram sizes directly, and it cannot lean on TCP’s MSS handshake because there is no TCP. So RFC 9000, the QUIC standard from 2021, builds the floor into the protocol. A client’s Initial packets must travel in UDP datagrams of at least 1200 bytes of payload, padding up to that size if the handshake itself is smaller. The 1200 floor does double duty. It guarantees the path can carry a packet large enough for the handshake, and it limits amplification: a server must not send more than three times the bytes it has received from an unvalidated client, so forcing the client’s first datagram to 1200 bytes caps how much a spoofed address can make the server emit.

That 1200-byte minimum is itself a packet-size signal. Every QUIC connection on the Internet opens with a datagram padded to at least 1200 bytes, which is a distinctive shape on the wire, and the padding pattern and exact initial size feed into QUIC initial-packet fingerprinting. Above the floor, QUIC discovers larger MTUs using DPLPMTUD, the datagram version of packetization-layer PMTUD, standardized as RFC 8899 in 2020. The same probe-and-search idea, now applied to UDP datagrams: send a larger probe, see if it is acknowledged, ratchet up. The recommended base size that a sender assumes will work on almost any path is 1200 bytes for IPv4, and IPv6 paths must carry at least 1280 by the rules of IPv6 itself. A QUIC stack that probes aggressively for a larger MTU and one that sits at 1200 are distinguishable by their datagram sizes alone, which adds yet another implementation-specific tell to a protocol that already leaks plenty through its TLS ClientHello and transport parameters.

The through-line from 1990 to 2021 is that packet size kept being a problem the network could not solve cleanly, so each generation pushed the discovery up a layer. ICMP-based discovery trusted the network to report back, and the network filtered the report. Packetization-layer discovery moved the job into TCP so it stopped trusting ICMP. QUIC moved it into the application’s own transport because there was no TCP to lean on. At every step the size of the packets a host sends became more a property of that host’s specific stack and configuration, and therefore more readable.

What the number can and cannot tell you

Packet size is a strong hint and a weak proof, and the difference matters. An MSS of 1380 is consistent with default WireGuard, but it is also consistent with a few other tunnels and with some deliberately lowered configurations, and a clamping middlebox could have produced it for a host that never ran a VPN at all. An MSS of 1452 is consistent with PPPoE, which is millions of ordinary broadband users. The value narrows the possibilities. It rarely closes them. The honest read of a non-1460 MSS is “this packet crossed something with overhead,” and the interesting work is figuring out what.

Where the signal earns its keep is in combination and in contradiction. On its own, an MSS is one weak field. Read alongside the window size, the window scale, the TCP option order, and the TTL, it becomes part of a stack fingerprint that is hard to forge wholesale, because an application that wants to lie about all of them has to reach below itself and rewrite kernel behavior. Read against the User-Agent, it becomes a consistency check: the packet says one thing about the link and the OS, the header claims another, and the gap is the finding. A scraper that runs through a datacenter VPN while presenting a mobile browser string is exactly the case these checks were built to catch, and the MSS is one of the fields that does the catching. The same logic generalizes to the wider TCP/IP-stack-versus-User-Agent mismatch that gives proxies away.

There is a quieter ceiling on all of this, and it is worth ending on. Most of the Internet now sits behind a clamping router, a CGNAT box, or a load balancer that terminates the client’s TCP connection and opens a fresh one of its own. When a connection is terminated and re-originated, the MSS a server reads belongs to the terminator, not the user. A CDN edge, a corporate proxy, a cloud load balancer all flatten the packet-size signal to their own configuration, and a large and growing share of traffic reaches origin servers through exactly such a hop. The MSS that arrives at a busy site is, more and more often, the MSS of the last machine in the chain rather than the first. The tell is still there. It is just increasingly telling you about the infrastructure between you and the client rather than the client, and reading it well means knowing which of those two you are actually looking at.


Sources & further reading

Further reading