Skip to content

The history of the proxy: from CERN httpd to residential proxy networks

· 21 min read
Copyright: MIT
The word PROXY as a large monospace wordmark with an orange arrow passing a request through an intermediary box

A proxy is a machine that makes a request on your behalf. That single idea has done an astonishing amount of work over thirty years, and it has meant wildly different things to the people deploying it. To a CERN sysadmin in 1994 it was a way to let people behind a firewall reach the web at all, with caching bolted on so the second fetch was fast. To a corporate IT department it was a chokepoint for filtering and logging. To a privacy researcher it was a way to strip your IP from the record. To a spammer in 2003 it was someone else’s misconfigured Windows box. To Tor it was three hops of layered encryption. And to a 2026 scraping operation it is an IP address borrowed, knowingly or not, from a phone in Lagos or a smart TV in Ohio.

The same word, the same basic mechanism, and a thirty-year argument about who the intermediary serves. This post follows that thread. We start with the first caching web proxy and the firewall problem it solved, move through the corporate forward proxy and the headers it invented, then the anonymizers and the open-proxy abuse wave, then onion routing and Tor, and finally the residential and mobile proxy networks that turned consumer devices into a commodity. Along the way the technical artifacts matter: the CONNECT method, the X-Forwarded-For header, SOCKS, and the detection signals that gave rise to a whole adjacent industry.

The firewall problem and the first caching proxy

The web in 1993 had a connectivity problem that is easy to forget now. Plenty of organizations sat behind a firewall that blocked direct outbound connections. A user inside that perimeter ran a browser, the browser tried to open a TCP connection to some HTTP server on the open internet, and the firewall dropped it. The browser could not reach the web. The fix was an intermediary that lived on the firewall machine itself, accepted the request from the inside, and re-issued it to the outside world on the user’s behalf. That intermediary was the forward proxy.

CERN httpd, the first web server software, already had gateway features for reaching Gopher, WAIS, and FTP. In the spring of 1994 those features were extended to handle every method a web client used over HTTP, and the result is generally credited as the first true WWW proxy. The work is documented in a paper titled “World-Wide Web Proxies” by Ari Luotonen of CERN and Kevin Altis of Intel, last modified in May 1994. Luotonen had moved to Geneva in July 1993 to work at CERN, where he wrote a large share of CERN httpd, and the HTTP caching support in particular.

The paper’s framing is worth quoting because it captures the original purpose exactly: a proxy “provides access to the Web for people on closed subnets who can only access the Internet through a firewall machine.” The proxy spoke the native protocols outward, so a single intermediary could give a walled-off network access to HTTP, Gopher, WAIS, and FTP through one door. Clients lost almost nothing by routing through it, except for special handling they might have done for the non-HTTP protocols.

Then came the feature that outlived the firewall rationale. CERN httpd’s proxy could cache the documents it fetched from remote hosts, so the second request for a given URL returned faster than the first. This was probably the first WWW proxy ever to cache. The caching logic, written by Luotonen in April 1994, handled conditional GETs, expiry dates, and garbage collection to reclaim space. A proxy was no longer only a way around a firewall. It was now a performance layer, useful even to a user with an unrestricted connection. Both threads, access control and caching, run through everything that follows.

closed subnet open internet browser browser proxy + cache HTTP server FTP / Gopher *The original forward proxy sat on the firewall machine, re-issued requests for clients on a closed subnet, and cached responses so the second fetch was fast.*

CERN httpd itself had a short institutional life. The first release dates to December 1990, version 0.1 to June 1991, and the final release, version 3.0A, to July 1996, after which development moved to the W3C and the project was effectively retired in favor of the Java-based Jigsaw server. Early versions were public domain; later ones used the MIT license. The proxy idea, though, had already escaped the lab.

The corporate forward proxy and the headers it forced into existence

Through the late 1990s the forward proxy became standard infrastructure inside companies, universities, and ISPs. The motivations stacked up. Caching saved expensive WAN bandwidth when a thousand employees loaded the same news site. The proxy was a natural place to enforce acceptable-use policy, block categories of sites, and keep an access log. And it was a security boundary, the one machine allowed to talk to the outside, which made it the one machine you had to watch.

Squid, descended from the Harvest research cache, became the dominant open-source caching proxy and pushed the technology forward in ways that still shape how the web identifies clients. When a request passes through a proxy, the origin server sees the proxy’s IP, not the user’s. That breaks logging, geolocation, and abuse handling at the origin. Squid’s developers introduced a header to carry the original client address forward: X-Forwarded-For. Each proxy in a chain appends the address of the previous hop, producing a comma-separated list like X-Forwarded-For: client, proxy1, proxy2. It was never standardized at the time, just a de-facto convention that everyone copied, and it remains one of the most consequential non-standard headers on the web.

The ad-hoc X-Forwarded-* family eventually got a standardized replacement. RFC 7239, “Forwarded HTTP Extension,” was published in June 2014 by Andreas Petersson and Martin Nilsson of Opera Software. It folds the originating address, the protocol, and the proxy’s own interface address into one structured header with key-value parameters, for example Forwarded: for=192.0.2.60;proto=https;by=203.0.113.43. The structured form is less ambiguous to parse than a bare comma list. Adoption has been slow, and X-Forwarded-For is still what you see in the wild, but the standard exists. The taxonomy of forward, reverse, and transparent proxies, and how this header chain behaves through each, is its own subject covered in the X-Forwarded-For chain and the proxy taxonomy post.

There was a second header the proxy era forced into being, and it came from a problem that HTTP caching could not touch: encryption. A caching HTTP proxy reads the request line, fetches the URL, and serves the bytes. That works only while the proxy can read the request. Once SSL arrived, the proxy could not see inside an encrypted tunnel, and it had no business trying. The answer was the CONNECT method. A client asks the proxy to open a raw TCP tunnel to a host and port, the proxy connects and then blindly relays bytes in both directions without inspecting them. CONNECT example.com:443 and the proxy becomes a dumb pipe for the TLS handshake that follows. Useful, necessary, and as we will see, the exact mechanism that the open-proxy abuse wave rode straight into the spam business.

client proxy host:443 CONNECT host:443 open TCP encrypted bytes relayed blind, both directions *The CONNECT method turns an HTTP proxy into a blind TCP tunnel, which is what makes HTTPS through a proxy possible and what made open proxies useful to spammers.*

SOCKS, the protocol-agnostic alternative

The HTTP proxy understands HTTP. That is its strength and its limit. If you wanted to proxy arbitrary TCP traffic, mail, IRC, a database connection, you needed something that did not care about the application protocol. That was SOCKS, designed by David Koblas, a system administrator at MIPS Computer Systems. After Silicon Graphics took over MIPS in 1992, Koblas presented a paper on SOCKS at that year’s Usenix Security Symposium, putting the protocol in the public domain. Ying-Da Lee of NEC extended it to version 4 and later proposed SOCKS4a, which let clients pass a hostname instead of a pre-resolved IP.

SOCKS5 was standardized as RFC 1928 in March 1996, with authors M. Leech, M. Ganis, Y. Lee, R. Kuris, D. Koblas, and L. Jones. A SOCKS server listens on TCP port 1080. The key architectural difference from an HTTP proxy is the layer it works at: SOCKS sits between the application and transport layers and does not parse or filter the payload, which keeps overhead and latency low. SOCKS5 also added authentication and UDP support. That protocol-agnostic, no-inspection design is exactly why SOCKS5 endpoints are still the default sold by residential proxy providers today, thirty years on.

Anonymizers: the proxy as a privacy tool

Up to this point the proxy served the network operator. The next move handed it to the user. If a proxy makes the request on your behalf, then the origin server sees the proxy, not you, and that is a privacy property, not just a connectivity one. Someone was going to sell it.

Lance Cottrell did. An astrophysics PhD student at UC San Diego and the author of the Mixmaster anonymous remailer, Cottrell founded a company in 1995, originally Infonex Internet. In 1997 it took the name Anonymizer after acquiring a web-based privacy proxy of that name built by Justin Boyan at Carnegie Mellon. The pitch was simple: route your web browsing through Anonymizer’s server, and the sites you visit see Anonymizer’s address instead of yours. It was a single-hop proxy with a privacy promise wrapped around it, and it is generally counted as the first commercial web anonymizer.

The single-hop design carried an obvious weakness that defines the entire anonymizer category. The operator can see everything. The proxy knows your real IP and knows every site you asked for. You have not removed the observer, you have relocated trust to one company and its logging policy and its response to a subpoena. For casual privacy that traded one exposure for another, it was fine. For anyone facing a serious adversary, a single trusted hop was not enough. That gap, the need to remove trust from any single operator, is what onion routing was built to close.

Open proxies and the abuse wave

Somewhere between the corporate proxy and the privacy proxy sat a third population that nobody intended to create: the open proxy. An open proxy is one that will relay for anyone, with no authentication, usually because it was misconfigured or shipped that way. The canonical early example was WinGate, a Windows connection-sharing program that predated built-in internet sharing. Home users ran it to share a single dialup across a few machines, and it shipped with a default configuration that let anyone on the internet connect to it and telnet back out to an arbitrary host and port. The internet filled up with accidental relays.

Two HTTP mechanisms made this lucrative for abuse. A plain GET http://target/ request through a misconfigured proxy would happily fetch and return the page, which let people scan for working open relays by telnetting to a candidate, sending a GET for a known URL, and seeing if the page came back. And CONNECT, the method built for HTTPS tunneling, was the bigger prize. A proxy that allowed CONNECT to arbitrary ports let a spammer tunnel to port 25 on a mail server and relay mail through the proxy’s IP, hiding the real source. Open-proxy abuse became implicated in a large share of email spam, to the point where spammers actively installed open proxies on victims’ Windows machines using purpose-built viruses. By the time the CoDeeN research proxy at Princeton came online, its operators had to restrict CONNECT to SSL ports specifically because the 1990s had shown what unrestricted tunneling became.

This is the moment the proxy split from a tool into a reputation problem. Public proxy lists circulated, anti-spam blocklists started tracking open relays and open proxies, and the IP address you came from began to carry a verdict before you sent a single byte. That verdict, IP reputation, is now a core anti-bot signal, and the long arc from open-proxy blocklists to modern ASN reputation scoring is covered in how anti-bot vendors detect residential proxies and ASN reputation.

1994 CERN httpd caching proxy 1996 SOCKS5, RFC 1928 1997 Anonymizer web proxy ~2003 open-proxy spam wave 2002 Tor network deployed 2012 Hola P2P network launches 2021 Luminati becomes Bright Data *Three decades of the same mechanism serving very different masters, from a CERN firewall workaround to a consumer-device proxy market.*

Onion routing and Tor: removing trust from the single hop

The anonymizer’s flaw, one operator who sees everything, had been the subject of academic work since the early 1990s. In 1995, David Goldschlag, Mike Reed, and Paul Syverson at the U.S. Naval Research Laboratory asked whether you could build internet connections that did not reveal who was talking to whom, even to someone watching the network. Their answer was onion routing: send the traffic through several relays and wrap it in a layer of encryption for each hop, so each relay peels one layer and learns only the previous and next hop, never the whole path.

The structure is what makes it work. The first relay knows your IP but not your destination. The last relay, the exit, knows the destination but not your IP. No single relay sees both ends. This is a different security model from the anonymizer entirely. The anonymizer asks you to trust one operator; onion routing is designed so that no single relay can deanonymize you, and you do not have to trust any one of them.

The second-generation design became Tor. Roger Dingledine and Nick Mathewson joined Syverson, and the Tor network was first deployed in October 2002 with its code released under a free software license. By the end of 2003 it had around a dozen volunteer relays, mostly in the United States with one in Germany. In 2004 the Naval Research Laboratory released the source under a free license and the Electronic Frontier Foundation began funding Dingledine and Mathewson’s work. The Tor Project, Inc., a 501(c)(3) nonprofit, was founded in 2006. The defining paper, “Tor: The Second-Generation Onion Router,” was presented at the 13th USENIX Security Symposium in August 2004.

Tor matters to the proxy story for two reasons beyond its privacy mission. First, its client speaks SOCKS: applications point at a local SOCKS proxy and Tor handles the circuit. The protocol-agnostic intermediary from 1992 became the front door to the most ambitious anonymity network ever built. Second, Tor’s exit nodes are a small, published, heavily scrutinized set of IP addresses. Anyone can download the list. That made Tor traffic trivially identifiable at the network edge, and sites that did not want it could block the exit list outright. The reaction to that blocking pressure, the desire for IPs that did not announce themselves as proxies, points straight at what came next.

client guard sees IP middle sees neither exit sees dest dest no single relay sees both your IP and your destination *Onion routing's structure removes trust from any one operator: the guard knows who you are, the exit knows where you went, and no relay knows both.*

The residential proxy economy

By the mid-2010s the proxy market had a clear hierarchy of IP quality. Datacenter proxies were cheap and abundant, but their IP ranges belonged to hosting providers, sat in published ASNs, and were easy to flag and block at scale. What buyers wanted was IPs that looked like ordinary people: addresses owned by consumer ISPs, assigned to home broadband and phones, with the reputation of a real residential customer. Those addresses are not for sale through any normal channel. So the industry found another way to obtain them. It borrowed them from the people who already had them.

The company that defined this model started as Hola, founded by Ofer Vilenski and Derry Shribman, which launched its peer-to-peer network in late 2012 and went viral in January 2013. Hola was a free VPN-style browser extension. The catch, buried in the model rather than the marketing, was that free users became exit nodes for other people’s traffic. Hola packaged that borrowed residential bandwidth and sold it to businesses under the name Luminati, reportedly charging around $20 per gigabyte. Your home connection was the product, and someone else’s web requests were going out under your IP.

The reckoning came in May 2015. After 8chan was hit by an attack that routed through the Hola network, founder Fredrick Brennan went public, and security researchers piled on with findings that the same plumbing could be used to push malware to Hola users and to mount denial-of-service attacks through the userbase. The criticism was close to unanimous and it centered on consent: most free users had no real idea their devices were relaying strangers’ traffic. The Luminati business kept growing regardless. In August 2017 Hola sold a majority stake to EMK Capital, and in March 2021 the Luminati division was rebranded as Bright Data, today one of the largest residential proxy providers in the world. A 2018 Trend Micro report had already laid out the model in detail and the “legal botnet” label stuck.

Over the following years the sourcing method matured into an industry of its own: the proxy SDK. Instead of a single viral app, providers offer a software development kit that any app or browser-extension developer can embed. The app shows the user a consent screen offering ad-free access in exchange for letting the provider route traffic through their device, and the developer gets paid per opted-in monthly active user. IPRoyal sources its pool through an app called Pawns, formerly IPRoyal Pawns, and advertises that developers can earn on the order of $200 CPM by integrating the SDK. Bright Data and Infatica run comparable SDK programs. The honest version of this is a clearly disclosed opt-in where the user understands the trade. The dishonest version is a vague EULA and a checkbox most people never read.

Spamhaus, which has tracked IP reputation since the open-proxy days, describes residential proxy networks bluntly as a danger and lays out three sourcing paths: SDKs embedded in apps with vague license terms, software the user installed without understanding, and outright compromised devices. The pool is no longer just desktops and phones. It runs on streaming sticks, media players, and doorbells. Spamhaus puts the scale at millions of IPs in nearly every country and ASN, and ties the model to spam, internal network scanning when an infected device sits on a corporate LAN, and abuse of weakly secured admin interfaces. The line between a consented proxy SDK and a botnet is consent and disclosure, and in practice that line is blurry. In December 2025 Spamhaus documented a botnet it called Kimwolf that infected more than two million Android TV streaming devices, the kind of device increasingly drafted into these pools.

Mobile proxies are the premium tier of the same idea. They route through real mobile carrier IPs, which carriers hand out behind carrier-grade NAT so that thousands of subscribers share a handful of public addresses at any moment. Blocking a mobile IP risks blocking a large block of legitimate customers, which makes carrier IPs the hardest to ban and the most expensive to rent. The technical distinctions between datacenter, residential, and mobile, and what each costs to run, are the subject of residential vs datacenter vs mobile proxies and how proxy networks source IPs.

Detection caught up

The residential proxy promise was an IP that looks like a person. That promise leaks. The IP is real consumer space, but everything around it can give the operation away. The exact detection stack varies by vendor and most of it is not publicly documented, but the publicly discussed signals fall into a few buckets, and none of them depend on the IP’s ASN alone.

The crudest is reputation and pool overlap. Providers reuse the same residential IPs across many customers, so an address that just served a sneaker-cop request and now arrives at a bank, or one that appears across thousands of unrelated sessions, builds a footprint that intelligence vendors like Spur catalog and sell. Then there is the mismatch class. The IP geolocates to a home in one country while the TLS or TCP stack, the timezone, and the latency say something else. A residential IP fronting a Linux datacenter scraper has the wrong OS fingerprint for the device the IP claims to be, and the round-trip latency to the supposed origin can betray that the traffic is being relayed from somewhere else. Those checks are covered in detecting a proxy by OS mismatch and the geolocation-vs-latency check. On top of that sits the entire browser and network fingerprinting apparatus that anti-bot vendors run regardless of IP, from TLS fingerprinting to HTTP/2 frame analysis, which means a perfect residential IP behind an imperfect client still fails.

The result is an arms race that mirrors the open-proxy blocklist era, one layer up the stack. In the 2000s the question was whether your IP appeared on a list of known open relays. Today the question is whether your IP, your TLS fingerprint, your HTTP/2 settings, your timezone, and your behavior all tell a consistent story about a single real person on a single real device. The proxy gave you a believable address. The believable address turned out to be the easy part.

What thirty years of the intermediary actually shows

Strip away the eras and the proxy has been one mechanism the whole time: a machine that requests on your behalf, sitting between two parties that would otherwise talk directly. What changed was never the mechanism. It was the answer to a single question, who does the intermediary serve. CERN’s proxy served the network operator who needed to pierce a firewall and save bandwidth. The corporate proxy served the same operator, now as a filter and a logbook. The anonymizer flipped it to serve the user, at the cost of trusting one company completely. Tor’s design refused to let any single hop be trusted at all. And the residential network serves a paying customer using an IP borrowed, with or without meaningful consent, from a stranger’s phone.

Two technical artifacts from the early era turned out to be load-bearing for everything after. The CONNECT method, built so HTTPS could pass through a caching proxy, is the same blind tunnel that made open proxies useful to spammers and that residential providers relay through today. And SOCKS, a protocol-agnostic intermediary from a 1992 Usenix paper, is still the default endpoint a 2026 proxy customer connects to, by way of Tor and every commercial provider in between. The headers the proxy forced into existence, X-Forwarded-For and eventually RFC 7239’s Forwarded, are the web’s permanent admission that the address you connect from may not be the address you came from.

That admission is the whole game now. The open-proxy blocklists of the 2000s asked a binary question about your IP. The detection stacks of 2026 ask whether your IP, your handshake, and your behavior agree on who you are, and a borrowed residential address is only the first of several claims that all have to line up. The intermediary that began as a way onto the web has become the thing the web spends the most effort trying to see through.


Sources & further reading

Further reading