Skip to content

The history of Tor: the anonymity-vs-fingerprinting arms race

· 24 min read
Copyright: MIT
The word TOR as a monospace wordmark with three orange onion-layer rings, on a black background

A government built Tor. That sentence still surprises people who think of Tor as a tool for evading governments, but the order of events is exactly backwards from the popular story. Onion routing came out of the U.S. Naval Research Laboratory in the mid-1990s, and its first purpose was to let intelligence officers and military personnel use the open internet without their traffic giving them away. The hard part was never the cryptography. The hard part was that a network used only by spies is a network that identifies spies. To hide a government user, you need a crowd of non-government users to hide them in.

That single insight, that anonymity loves company, runs through the whole history of the project and explains most of its design decisions. It explains why the Navy released the code under a free license. It explains why the Tor Project became a nonprofit funded partly by the same government it protects users from. And it explains the strangest-looking decision of all: that the Tor Browser tries to make every one of its users look bit-for-bit identical, because a browser that makes you anonymous on the network while leaving you unique in the browser has not made you anonymous at all.

This post traces that arc. It starts with the NRL onion-routing prototypes and the 1998 patent, moves to the 2002 release and the 2004 design paper that defined modern Tor, covers the 2006 nonprofit and the censorship-circumvention work that followed, then digs into the hidden-service era and the v3 rewrite. It ends on the part most relevant to anyone who studies tracking for a living: the Tor Browser’s fight against browser fingerprinting, and why the project chose uniformity over randomization. Where the history connects to the fingerprinting and proxy work covered elsewhere on this blog, the links are inline.

1995: the problem the Navy actually had

The core idea came from three people at the U.S. Naval Research Laboratory: the mathematician Paul Syverson, and the computer scientists Michael G. Reed and David Goldschlag. In 1995 they asked a narrow question. Could you build internet connections that hide who is talking to whom, even from someone watching the network itself? Encryption hides the contents of a message. It does nothing for the metadata, the fact that address A is talking to address B, and for an intelligence agency the metadata is often the whole game. Knowing that a particular embassy IP just opened a connection to a particular database tells you most of what you wanted to know, regardless of whether you can read the bytes.

Reed has been blunt about the original motivation in public comments over the years: the purpose was U.S. Department of Defense and intelligence use, including open-source intelligence gathering and protecting personnel deployed in the field. The goal was to let government people work online without an observer unmasking them by their traffic patterns. Anonymity here was an operational requirement, not a civil-liberties value.

The mechanism they designed is the one everyone now knows by its metaphor. A message gets wrapped in layers of encryption, one layer per relay in its path, like the layers of an onion. Each relay strips exactly one layer, which reveals only the next hop and nothing else. The first relay knows who sent the message but not its final destination. The last relay knows the destination but not the origin. No single relay sees both ends. An eavesdropper watching one link sees encrypted bytes whose true source and sink are several hops away.

The published version of this work is “Anonymous Connections and Onion Routing,” which appeared in the IEEE Journal on Selected Areas in Communications in May 1998, volume 16, issue 4, pages 482 to 494. The Navy obtained U.S. Patent 6,266,704, “Onion routing network for securely moving data through communication networks,” assigned to the Secretary of the Navy. The work was refined under DARPA funding before the Navy eventually released the implementation. The early system is usually called generation-zero and generation-one onion routing in the project’s own retrospective; it worked, it was deployed in limited form, and it carried the same flaw every low-latency anonymity system carries to this day.

That flaw is the threat model. Onion routing protects you against an adversary who watches some of the network. It does not protect you against an adversary who watches all of it. If someone can observe the traffic entering the first relay and the traffic leaving the last relay at the same time, they can correlate the timing and volume of the two flows and link them, regardless of how many encryption layers sat in between. The 1998 work was explicit that this is a real-time, low-latency system, and low latency is exactly what makes the correlation possible. A system that delayed and reordered traffic enough to defeat timing analysis would be too slow for interactive use. Every later version of Tor inherited this boundary, and the project has always stated it plainly rather than pretending otherwise.

A circuit: nobody sees both ends client 3 layers guard peels 1 middle peels 1 exit peels 1 site knows you, not the site knows the site, not you *The three-hop circuit. The entry relay knows the client but not the destination; the exit knows the destination but not the client. Defeating this still requires watching both ends at once.*

2002: the second-generation onion router

The version that became Tor started around 2002. Syverson stayed on from the NRL side. Roger Dingledine, an MIT graduate, had worked with Syverson on the lab’s onion-routing project, and Nick Mathewson, a classmate of Dingledine’s at MIT, joined soon after. Dingledine gave the project its name. People kept calling it “the onion routing project,” and the natural acronym was TOR. The capitalization later relaxed to Tor. The alpha network was deployed on 20 September 2002, and its code shipped under a free and open-source license, which mattered enormously for what came next.

The free license is where the crowd-needs-a-crowd logic becomes concrete. A closed, government-only anonymity network anonymizes nothing, because every user is by definition a government user. The only way to give the Navy’s people real cover was to mix them into a large and varied population of ordinary users: journalists, activists, people in censored countries, the privacy-conscious, and yes, criminals. The more diverse the traffic, the better everyone hides. Releasing the source and inviting the public in was not a change of mission. It was the mission, executed correctly.

By the end of 2003 the live network was tiny: about a dozen volunteer relays, almost all in the United States, plus one in Germany. The design that would let it grow was written up the next year. “Tor: The Second-Generation Onion Router,” by Dingledine, Mathewson, and Syverson, was presented at the 13th USENIX Security Symposium in August 2004. That paper is the blueprint for the Tor people use today, and its contribution was less a single new idea than a careful list of fixes to the first-generation design.

The paper describes Tor as a circuit-based low-latency anonymous communication service and lists what it adds over the original onion routing: perfect forward secrecy, congestion control, directory servers, integrity checking, configurable exit policies, and a practical design for location-hidden services through rendezvous points. Each of those is worth a sentence. Perfect forward secrecy means circuits are built by a step-by-step handshake so that compromising a relay’s long-term key later does not decrypt past traffic. Directory servers replaced the original assumption that every relay knew about every other relay, which does not scale; instead a small set of trusted directory authorities publish a signed list of relays. Integrity checking stops a relay from quietly tampering with the cells passing through it. Exit policies let a relay operator decline to carry certain traffic, which is what makes running an exit node legally survivable. And rendezvous points are the foundation of hidden services, covered below.

The circuit-building method is the part worth understanding in a little depth, because it is what makes the no-single-relay-sees-both-ends property actually hold. Tor builds a circuit by telescoping. The client first negotiates an encrypted connection with the entry relay. Through that connection it tells the entry relay to extend the circuit to a middle relay, negotiating fresh keys with the middle relay that the entry relay cannot read. Then through that hop it extends again to the exit. At the end the client shares a separate symmetric key with each of the three relays, and each relay knows only its immediate neighbors. The client wraps each outbound cell in three layers and the relays peel them in turn, exactly as the onion metaphor promises.

Telescoping: one key per hop, built in order step 1 G key k1 with guard step 2 G M key k2 with middle, tunneled through guard step 3 G M E key k3 with exit Orange = the hop being negotiated. The guard never learns k2 or k3. *Circuit construction telescopes outward. The client negotiates a fresh key with each relay in turn, tunneling each new handshake through the relays it has already established, so no relay learns the keys of the hops beyond it.*

2004 onward: EFF money and a nonprofit

In 2004 the Naval Research Laboratory released the Tor code under a free license and the Electronic Frontier Foundation began funding Dingledine’s and Mathewson’s work. This is the moment Tor stopped being a lab project and started being a public utility. Two years later, in 2006, The Tor Project, Inc. was founded as a 501(c)(3) nonprofit to maintain development. The institution that runs Tor today dates from then.

The funding picture has always been awkward and the project has never hidden it. Figures circulated around 2012 put roughly 80 percent of the Tor Project’s roughly two-million-dollar annual budget as coming from the U.S. government, with the State Department, the Broadcasting Board of Governors, and the National Science Foundation among the largest contributors. People find this contradictory: a privacy tool funded by the government. It is only contradictory if you forget why the Navy wanted it in the first place. A government that wants its own people to be anonymous online has a direct interest in a large, healthy, genuinely independent anonymity network full of ordinary users. The funding and the mission point the same direction. Over the following years the project worked to diversify its income toward individual donations and non-government grants, partly for independence and partly because being seen as a U.S. government project undercuts trust among exactly the users who provide the cover.

Usage grew through the late 2000s and early 2010s. By November 2013 Tor had on the order of four million daily users. The network of nine directory authorities, whose health is publicly monitored, anchors the trust model: they collectively vote on and sign the consensus document that every client downloads to learn the current set of relays. That small, named, geographically spread set of authorities is a deliberate trade-off. Centralizing trust in nine parties is a weakness, but a fully decentralized directory has its own attacks, and the project judged a transparent, auditable set of authorities the lesser risk.

Bridges and pluggable transports: the censorship arms race

A network is only useful if you can reach it, and the obvious move for a censor is to block the entry. The full list of Tor relays is public, by design, so a national firewall can simply drop every connection to every known relay. Tor’s answer, starting around 2007, was bridges: entry relays that are not listed in the public directory. A user obtains a bridge address out of band, and the censor cannot block what it does not know about.

Blocking by address is only the first layer of censorship, though. A censor that cannot enumerate bridges can still recognize Tor by what its traffic looks like on the wire. The Tor handshake has a distinctive shape, and deep-packet-inspection gear can fingerprint it and drop it without ever knowing the destination. This is the same fingerprinting logic that drives TLS fingerprinting on the anti-bot side of the world: you do not need to read the payload if the shape of the connection already tells you what tool produced it. The countermeasure became pluggable transports, a modular system where the traffic between user and bridge is disguised to look like something else.

The transports are a small arms race in themselves. The obfs family, of which obfs4 is the long-standing workhorse, adds a layer of encryption that turns the traffic into bytes with no recognizable structure, so there is no fixed pattern for DPI to match. obfs4 also resists active probing, the censor technique of connecting to a suspected bridge to see whether it speaks Tor. Meek took a different route: it tunnels traffic through a real HTTPS connection to a large cloud provider, so blocking it means blocking that whole provider, which most censors are unwilling to do. Snowflake uses temporary volunteer proxies and makes the traffic resemble a WebRTC video or voice call, the kind of thing that looks innocuous and is everywhere. WebTunnel, a more recent transport, wraps the connection to look like ordinary HTTPS web traffic. The proxy package that ships these is now called Lyrebird, the successor to the old obfs4proxy. The pattern across all of them is the same one that runs through the rest of this blog’s history of the proxy: every observable property of a connection is a potential signature, and hiding means either removing the signature or copying someone else’s.

Four ways to not look like Tor obfs4 looks like random bytes no structure to match on meek looks like HTTPS to cloud block it = block the provider Snowflake looks like a WebRTC call throwaway volunteer proxies WebTunnel looks like plain HTTPS hides in the web's bulk *Each pluggable transport disguises the user-to-bridge hop differently. The shared goal is to give deep-packet inspection no stable signature to block on.*

Hidden services and the v3 rewrite

The rendezvous design in the 2004 paper made something new possible: a service that is reachable through Tor but whose own location stays hidden. These are onion services, originally called hidden services. They were first specified in 2003 and have been deployed on the network since 2004. The address ends in .onion and resolves only inside Tor; there is no DNS, no public IP, and in principle no way to learn which machine is actually hosting the service.

The mechanism is worth a paragraph because it is genuinely clever. The service picks a few relays as introduction points and publishes, to a distributed directory, the fact that it can be reached through them, signed with its own key. A client that wants to connect picks its own relay as a rendezvous point, then sends a message to one of the introduction points asking the service to meet it there. If the service agrees, both the client and the service build circuits to the rendezvous point, and traffic flows through it. Neither side ever learns the other’s IP address, because every leg of the path is itself a Tor circuit. The location is hidden because nobody on the path knows both ends, the same property that protects an ordinary browsing circuit.

The first-generation onion addresses, now called v2, had a structural privacy leak in the directory layer. The data a service published to the hidden-service directory was uploaded essentially in the clear, which meant any relay holding the directory flag could learn a great deal about the services it was responsible for, including harvesting the addresses of services it should not have known existed. A malicious relay could position itself in the directory and quietly collect a census of onion services. That was a serious flaw for a system whose entire point is unlinkability.

Version 3 rebuilt the cryptography to close it. v2 addresses were 16 characters, a truncated hash of an RSA-1024 public key. v3 addresses are 56 characters because they contain a full ed25519 public key plus a checksum and a version byte, base32-encoded. Moving from RSA to ed25519 and from SHA-1 to the SHA-3 family modernized the primitives, but the more important change is what it enabled in the directory. Because the v3 address is itself a public key, the service can derive a blinded version of that key, different each day, and publish its directory data encrypted under it. A directory relay sees only a rotating blinded key and an encrypted blob. It cannot recover the address, cannot tell which service it is holding data for, and cannot enumerate active services by probing. The leak that defined v2 is closed by construction.

The migration was forced rather than gentle. The project announced v2’s deprecation well in advance, and beginning in October 2021, stable releases of the Tor software dropped support for v2 addresses entirely. Operators who had not moved to v3 simply went dark. The 56-character addresses are uglier to type and impossible to memorize, which is the cost of carrying a full public key in the name, and the project accepted that cost rather than keep a directory design that leaked.

What is in an onion address v2 truncated RSA-1024 hash 16 chars, SHA-1, directory data uploaded in clear v3 full ed25519 public key checksum ver 56 chars, SHA-3, directory data encrypted under a daily blinded key v2 dropped from stable releases in October 2021 *The v3 address carries a full ed25519 public key, which is what lets a service encrypt its directory data under a daily-rotating blinded key. The v2 design leaked addresses to directory relays.*

The browser problem: anonymity that the page can undo

Here is the trap that the rest of the project exists to avoid. Suppose Tor does its job perfectly. Your circuit is clean, no relay sees both ends, your IP is hidden behind the exit. You load a web page. That page runs JavaScript that reads your screen resolution, your installed fonts, your timezone, the precise way your GPU renders a canvas element, the list of your browser plugins, and a dozen other attributes. It combines them into a fingerprint. If that fingerprint is unique, the site can recognize you across visits and even across sessions, and the network anonymity you paid for in latency is worth nothing, because the browser handed your identity to the page directly.

This is the same fingerprinting that the EFF’s Panopticlick experiment demonstrated in 2010 and that grew into the entropy economy the anti-bot industry runs on. For Tor the stakes are sharper. A normal user being fingerprinted loses some privacy. A Tor user being fingerprinted loses the entire point of using Tor. So the Tor Browser, which the project began developing around 2008, is not merely Firefox with a proxy bolted on. It is a heavily modified Firefox whose central design goal is to make fingerprinting fail.

The project’s strategy is the interesting part, because it is the opposite of what most other browsers chose. Two philosophies exist for beating a fingerprint. You can randomize, feeding each site slightly different values so the fingerprint is unstable and cannot be used to recognize you over time. Or you can make everyone uniform, so that every user of the browser produces the same fingerprint and individuals dissolve into the group. Brave and Firefox’s resistFingerprinting lean on randomization. Tor chose uniformity, and the reasoning behind that choice is the most important idea in this whole section.

Why Tor chose uniformity over randomization

The Tor design goal, stated for years, is that all Tor users should share the same fingerprint, regardless of operating system or hardware. The point is to collapse the number of distinguishable buckets for every measurable attribute until each bucket holds a large crowd. If everyone running the Tor Browser reports the same fonts, the same screen dimensions, the same user-agent, the same canvas output, then no single one of those attributes distinguishes one Tor user from another. The anonymity-loves-company principle from 1995 reappears, now at the browser layer rather than the network layer. The crowd that hides the Navy’s spy on the network is the same crowd that hides you in the fingerprint.

The case against randomization is subtle and worth stating carefully. The intuition is that random values should be untrackable, and for a single attribute that is roughly true. The problem is consistency across attributes. The project describes what it calls the paradox of fingerprintable privacy technology: a defense that introduces artifacts or contradictions can make you more identifiable, not less. Imagine a randomizer that varies your reported GPU but forgets to vary a correlated attribute, or that produces a combination of values no real device would ever have. That impossible combination is itself a signature. You are now the only person in the world reporting it. The act of trying to hide has made you unique, which is precisely the failure mode you were trying to avoid. Uniformity does not have this problem, because there are no inconsistencies to leak; everyone looks like the reference configuration, and the reference configuration is internally coherent because it is a real, fixed target.

Pseudocode makes the contrast concrete. The point is the difference in what an observer can compute, not any working code.

# randomization defense (per session)
canvas = real_canvas() + noise()
webgl = real_webgl() + noise()
fonts = shuffle(real_fonts())
# risk: noise on one axis, truth on another -> impossible combo -> unique
# uniformity defense (Tor Browser)
canvas = BLOCKED # no readback
webgl = BLOCKED # no readback
fonts = FIXED_BUNDLE # same set for every user
screen = round_to(200, 100) # letterboxed buckets
ua = PLATFORM_CONSTANT # everyone is "the same OS"
# every user computes the identical fingerprint -> bucket of N

The specific defenses follow directly from the uniformity goal. Canvas image extraction is blocked, so a page cannot read back the pixel-level rendering quirks that make canvas such a strong fingerprint; the same applies to WebGL readback. The browser ships a fixed bundle of fonts and limits font enumeration with character fallback, so that font lists, one of the richest fingerprinting sources, are identical for everyone and reveal nothing about what is actually installed on the machine. The list of requested languages is restricted to a small predefined set rather than reflecting the user’s true locale.

Screen and window size get the most visible treatment: letterboxing. Real monitor dimensions are high-entropy, and a maximized window leaks the exact monitor size. So the Tor Browser rounds the content area to a multiple of 200 by 100 pixels and pads the rest of the window with a neutral margin, the letterbox bars. Many different physical screens collapse into the same small set of reported content sizes. The user-agent string is forced to a platform constant: Windows users report Windows 10, macOS users report OS X 10.15, Android reports Android 10, and Linux and other systems report a generic Linux on X11. The reasoning the project gives is direct. Any option a user could choose would only make them more unique, so the choice is removed. Even timers are coarsened; functions like performance.now() are made less precise to blunt the micro-architectural timing side channels that can fingerprint hardware.

This is the same uniform-target idea that anti-detect browsers later tried to weaponize in reverse, covered in the history of anti-detect browsers: if a coherent, shared fingerprint defeats tracking, then a coherent, spoofed fingerprint can defeat detection, provided you never leak an inconsistency. The hard part in both directions is the same. Coherence is fragile. One attribute that does not match the story, one timer that is too precise, one font that should not be there, and the crowd of one reappears.

Two ways to beat a fingerprint randomization uniformity you many small buckets; an impossible combo can leave you alone in one everyone one large bucket; you are indistinguishable from the rest *Randomization scatters users into many small buckets and risks producing a unique, impossible combination. Uniformity puts every user into one large bucket. Tor chose the second.*

What the history actually shows

The thread that holds Tor’s thirty-year history together is not cryptography. The layered-encryption trick was the easy part, sketched at the NRL by 1995 and patented by 1998. The thread is the realization, forced on the designers by their own threat model, that hiding requires a crowd and that the crowd has to be real. You cannot fake a population of users, and you cannot anonymize a network of spies by keeping it secret. You have to open the doors, mix everyone together, and make them all look alike. That logic took the project from a Navy lab to a free license to a public nonprofit, and it took the Tor Browser from a proxy wrapper to a system that letterboxes your window and lies about your operating system so that you look like everyone else.

The arms race is not settled and the design honestly admits where it loses. Tor never claimed to beat a global passive adversary who can watch both ends of a circuit at once, and that limit is the same in 2026 as it was in 1998. The browser side is a perpetual chase: every new web API that exposes a hardware detail is a new fingerprinting surface, and the project has to neutralize each one before it fragments the uniform crowd. The October 2021 v2 cutover showed the project willing to break backward compatibility and inconvenience its own users to close a privacy leak, which is the kind of decision that distinguishes a privacy tool from a product with a privacy feature.

The detail worth ending on is the one that keeps surprising people: the same government a Tor user is often hiding from has a real, ongoing interest in that user existing. The Navy did not build an anonymity network so that one officer could be anonymous. It built one so that millions of unremarkable people would route their traffic through it, and the officer would be one unremarkable flow among them. Every journalist, every activist, every person idly reading on the Tor Browser is, in the original design’s terms, cover traffic. The system works precisely because most of its users have nothing to hide.


Sources & further reading

  • The Tor Project (2024), Tor Project: History — the project’s own timeline from the 1995 NRL origin through the 2002 release, EFF funding, and the 2006 nonprofit.
  • Reed, Syverson, Goldschlag (1998), Anonymous Connections and Onion Routing — the IEEE JSAC paper (vol. 16, no. 4) that introduced onion routing and stated its threat model.
  • Dingledine, Mathewson, Syverson (2004), Tor: The Second-Generation Onion Router — the USENIX Security paper that defined modern Tor, telescoping circuits, and rendezvous-point hidden services.
  • Wikipedia (2026), Tor (network) — consolidated dates, funding figures, directory-authority count, and onion-service timeline.
  • Wikipedia (2026), Onion routing — the patent number (US 6,266,704), the IEEE citation, and the NRL/DARPA development history.
  • The Tor Project (2021), V3 onion services usage — explains the v2 directory leak and how v3 blinded keys encrypt directory data.
  • The Tor Project (2024), Fingerprinting protections — the support page documenting letterboxing, the user-agent constants, font bundling, and the uniform-fingerprint goal.
  • The Tor Project (2019), Browser Fingerprinting: An Introduction and the Challenges Ahead — the project’s argument for uniformity over randomization and the paradox of fingerprintable privacy tools.
  • The Tor Project (2024), Using bridges and pluggable transports — documentation of obfs4, meek, Snowflake, and WebTunnel as anti-censorship transports.
  • The MIT Press Reader (2019), The Secret History of Tor — long-form account of the project’s path from a military experiment to a public privacy tool.
  • DigiCert Knowledge Base (2024), Onion Domains — reference on v2 versus v3 .onion address structure and the ed25519 encoding.

Further reading