Skip to content

Reverse, forward, and transparent proxies: a working taxonomy

· 24 min read
Copyright: MIT
Three labelled proxy positions on a client-to-origin line: forward near the client, reverse near the origin, intercepting in the middle

Three different machines can all sit on the wire between your laptop and a web server, all of them relay HTTP, and all of them get called a “proxy.” One is configured by you and works for you. One is configured by the server operator and works for the server. The third is configured by whoever controls the network you happen to be on, and it works for them, often without your knowledge and sometimes against your interest. Same verb, three loyalties.

The confusion is not just sloppy vocabulary. The direction a proxy faces decides who can read your traffic, whose IP the far end sees, which TLS certificate you validate, and who is accountable when something breaks. Get the taxonomy wrong and you misread a header, trust the wrong hop, or build a system that leaks the thing you meant to hide. This post sorts the species by the one property that actually distinguishes them: who chose the proxy, and which side of the connection it is loyal to.

We start with the RFC that already drew these lines, then take the three roles one at a time: the forward proxy that the client picks, the reverse proxy (the spec calls it a gateway) that the origin operator picks, and the intercepting proxy that nobody on the endpoints picked at all. From there we place the things people argue about. A CDN is a reverse proxy with a map. An API gateway is a reverse proxy that learned to read the body. A corporate SSL-inspection box is an intercepting proxy holding a certificate you were made to trust. We finish on the header machinery that lets these hops admit they exist, and the cases where they would rather not.

The line RFC 9110 already drew

The HTTP specification settled this taxonomy long before the marketing did. RFC 9110, the 2022 document that replaced the old RFC 7230 series as the definition of HTTP semantics, devotes section 3.7 to intermediaries. It names exactly three forms: proxy, gateway, and tunnel. Everything a vendor sells is one of those three wearing a product name.

A proxy, in the spec’s words, is “a message-forwarding agent that is chosen by the client, usually via local configuration rules, to receive requests for some type(s) of absolute URI and attempt to satisfy those requests via translation through the HTTP interface.” Read that carefully. The defining clause is chosen by the client. The client knows the proxy is there because the client put it there. This is what everyone outside the spec calls a forward proxy.

A gateway, the spec continues, “(a.k.a. ‘reverse proxy’) is an intermediary that acts as an origin server for the outbound connection but translates received requests and forwards them inbound to another server or servers.” The gateway lies to the client by design. As far as the client’s HTTP stack can tell, the gateway is the server. The client never learns there is a fleet of real origins behind it. That deception is the feature.

The third form, the tunnel, “acts as a blind relay between two connections without changing the messages.” A tunnel does not read the traffic. Once active it “is not considered a party to the HTTP communication” at all. This is the mode a forward proxy drops into when it carries HTTPS it cannot decrypt, which we get to shortly.

There is a fourth thing the spec describes but pointedly does not bless with one of those three names: the intercepting proxy. RFC 9110 notes that some intermediaries behave like proxies but are “not chosen by the client” and instead filter or redirect outgoing TCP traffic. The spec calls these out specifically because they break assumptions HTTP otherwise relies on. A proxy you chose can be reasoned about. A proxy that grabbed your packets off the wire cannot. The whole reason the document mentions interception is to warn implementers that the polite contract between client and proxy does not hold there.

So the real axis is not “forward versus reverse.” It is who chose this thing. The client chose it: forward proxy. The origin operator chose it: reverse proxy. Neither endpoint chose it: intercepting proxy. Hold that question in your head and every confusing box on a network diagram sorts itself.

who chose the proxy decides which way it faces Client Origin Forward client picks Intercepting neither picks Reverse origin picks loyal to client loyal to network owner loyal to origin *The three intermediary roles placed by the question that defines them: who chose the proxy. Direction of loyalty follows from the answer.*

The forward proxy: the client’s agent

A forward proxy sits in front of a set of clients and speaks to the wider internet on their behalf. The clients are configured to send their requests to it. The proxy decides whether to allow each request, fetches the resource, and hands back the response. The origin server, on the far side, sees a connection coming from the proxy’s IP. It does not see the client unless the proxy chooses to tell it.

That last property is why forward proxies show up everywhere from a corporate firewall to a residential-IP scraping fleet. The proxy is an IP-address launderer by construction. The origin’s logs record the exit node, not the user behind it. A company points every employee browser at an outbound proxy so it can apply acceptable-use policy, cache popular objects, log egress, and present a single set of source IPs to the outside world. A scraping operation points its crawlers at a pool of proxies for the same source-hiding property, turned to a different purpose. The mechanism is identical. Only the intent and the ownership differ.

Configuration is the tell. Because the client chose the proxy, the client has to be told where it is. The crude way is a static host-and-port set in the browser or the OS network settings. The scalable way is a proxy auto-config file, a small JavaScript document exposing a single function, FindProxyForURL(url, host), that returns which proxy (if any) to use for a given URL. Browsers fetch the PAC file, run that function per request, and route accordingly. By convention the file is named proxy.pac. To avoid hand-configuring every machine, the Web Proxy Auto-Discovery Protocol (WPAD) lets a client guess the PAC location through DHCP and DNS, with DHCP taking priority: if DHCP hands back a WPAD URL, no DNS lookup happens. The WPAD file is conventionally wpad.dat. The chain of conventions is fragile, and WPAD’s DNS-guessing behaviour has its own long history of being abused, but the point for the taxonomy is simple. A forward proxy must be discoverable by the client, because the client is the party that selects it.

Then there is HTTPS, which complicates the agent’s job. A forward proxy can read and rewrite plaintext HTTP all it likes. It cannot read TLS it does not have keys for. So the client uses the HTTP CONNECT method to ask the proxy for a tunnel. The client sends CONNECT example.com:443 to the proxy; the proxy opens a TCP connection to that host and port, replies 200, and from then on relays bytes blindly in both directions. The TLS handshake then runs through the tunnel, directly between the client and the real origin. The proxy never sees the session keys. This is the tunnel role from RFC 9110, entered on demand: a forward proxy for HTTP, a blind relay for HTTPS. A well-behaved proxy restricts CONNECT to port 443 precisely because an unrestricted CONNECT turns the proxy into an open relay for any TCP protocol at all.

Client Forward proxy Origin :443 CONNECT example.com:443 TCP open 200 Connection established TLS handshake, client to origin proxy relays opaque bytes; it never holds the keys *The CONNECT tunnel. The proxy is a forward proxy for the request line and a blind relay for everything after the 200, which is why the origin TLS certificate is the one the client validates.*

For anyone running crawlers, the forward proxy is the workhorse, and the operational questions are about the pool rather than the protocol: how IPs are sourced, whether sessions are sticky or rotating, and how to keep cookies coherent across a fleet that changes egress IP between requests. Those are their own posts. What matters for the taxonomy is the invariant: a forward proxy is the client’s agent, configured by the client, and it hides the client from the origin.

The reverse proxy: the origin’s front

Flip the loyalty and you get the reverse proxy. The spec calls it a gateway. It sits in front of one or more origin servers and pretends to be the origin. Clients connect to it as if it were the real thing. It terminates the connection, decides what to do, and forwards inbound to whichever backend should actually handle the request. The client never learns that backend exists.

This is the more common deployment by traffic volume, because almost every public website of any size puts something in front of its application servers. The canonical open-source example is nginx, whose proxy_pass directive inside a location block is the entire mechanism: requests matching that location get forwarded to the named upstream. By default nginx rewrites the Host and Connection headers on the proxied request and drops headers whose values are empty, with proxy_set_header as the override. That small detail matters. A reverse proxy is not a wire. It rewrites, terminates, and re-originates, and every one of those rewrites is a place where the client’s view and the origin’s view of a request can diverge.

Why put a reverse proxy in front of anything at all? The list of jobs is long, but it collapses to a few categories. It terminates TLS once, at the edge, so each backend does not need its own certificate and handshake. It spreads load across a pool of backends using a scheduling algorithm like round-robin or least-connections. It caches responses so identical requests do not all reach the origin. It presents a single hostname and IP for a service composed of many machines. And it absorbs abuse, because the thing attackers can reach is the proxy, not the application. A reverse proxy gives the origin operator a single choke point to apply policy, and a single place to terminate, inspect, and re-emit traffic.

Client A Client B Reverse proxy terminates TLS picks a backend origin 1 origin 2 origin 3 clients see one server; the backends are invisible to them *A reverse proxy presents one face to the world and fans requests out to a hidden backend pool. The origin operator chose it, which is what makes it reverse rather than forward.*

Once a reverse proxy terminates TLS, a problem appears that is the mirror image of the forward proxy’s source-hiding. The forward proxy hides the client and the origin wishes it knew who that was. The reverse proxy receives every client connection, but the backend behind it sees only the proxy’s IP. If the application wants the real client address for rate-limiting, geolocation, or abuse logging, the proxy has to forward it explicitly. That is the entire reason the X-Forwarded-For family of headers exists, and the standardised Forwarded header from RFC 7239 that folds client IP, protocol, and original host into one structured field. The trust model around that chain (which hop’s value you believe, and how an edge resolves the genuine client IP from a forgeable list) is subtle enough to deserve its own treatment. For the taxonomy the lesson is that a reverse proxy that terminates the connection owes the backend a faithful account of the client, and that account travels in headers the backend has to be configured to trust.

CDNs are reverse proxies with a map

A content delivery network is the reverse proxy taken to planetary scale. The mechanism does not change. A CDN edge terminates the client’s connection, serves from cache if it can, and forwards to origin if it must, exactly like a single nginx box in front of a single app. What the CDN adds is geography and a routing layer that picks which edge you reach.

That routing layer is usually anycast, where the same destination IP is advertised from hundreds of locations and the internet’s own routing delivers each client to a nearby one, sometimes combined with DNS-based steering that hands different clients different edge addresses. The result is a reverse proxy whose “single face” is actually a few hundred boxes spread across the planet, each of them terminating connections for the same hostname. The origin operator still chose this thing. The client still cannot see the backend. It is a gateway in the RFC 9110 sense, with a global cache hierarchy and a steering map bolted on.

Because the CDN terminates TLS, the certificate the client validates belongs to the CDN edge, not the origin. The handshake ends at the point of presence, and the origin sees the edge’s connection rather than the client’s. That gap (whose certificate, whose fingerprint, who holds the cleartext) is the subject of the TLS terminating proxy post and worth reading alongside this one, because the consequences ripple into bot detection and abuse logging. The caching behaviour, which response a given edge will reuse and for how long, is governed by HTTP caching headers and the cache key the edge computes. A CDN is the place where the reverse-proxy abstraction stops being a single box and becomes a distributed system, but the role on the taxonomy never changes. It faces the origin’s way.

One small piece of header machinery shows up here that does not appear in the textbook single-box case. When a request passes through several CDNs or several layers of the same CDN, those layers need to detect when a request has looped back into infrastructure it already traversed. The classic tool is the Via header, which each forwarding intermediary appends to, recording the protocol and an identifier for itself. A proxy that sees its own identifier already in Via knows the request has looped. Cloudflare found Via insufficient for multi-CDN topologies and introduced a dedicated CDN-Loop header for exactly this purpose, later standardised, because a customer chaining two CDNs could otherwise create an infinite forwarding loop between them. The detail is small. The lesson is that the more reverse proxies you stack, the more you need explicit markers in the request to reason about the path it took.

API gateways are reverse proxies that read the body

An API gateway is a reverse proxy that grew opinions about the application protocol. Sit it next to a plain reverse proxy and the bottom half is identical: it accepts client connections, terminates TLS, routes by hostname or path, and load-balances across backends. The difference is what it does above that layer. It authenticates the caller, enforces per-client rate limits, transforms requests and responses, validates schemas, meters usage for billing, and stitches together calls to several backend services behind one API surface. A plain reverse proxy routes bytes by their envelope. An API gateway reads the contents and acts on them.

The reason this category split off has a date and a shape. The reverse proxy solved the problem of putting one fast front end on a monolith. The API gateway is the pattern that appeared when the monolith fractured into dozens or hundreds of microservices and somebody needed a single entry point to authenticate, route, and rate-limit across all of them. The widely used implementations make the lineage explicit. Kong is built on nginx and OpenResty, the same reverse-proxy engine, with the API-management layer added on top. Envoy, originally written at Lyft and now a graduated CNCF project, is a high-performance proxy that handles HTTP/2 and gRPC and is configured dynamically through its xDS API; it rarely ships as a standalone gateway and instead sits underneath service meshes like Istio as the data-plane proxy doing the actual forwarding. The Kubernetes Gateway API, which by 2026 is the standard way to express this routing in a cluster, has Envoy, Istio, Cilium, and Kong among its implementations. Different products, same role on the taxonomy: a reverse proxy the origin operator chose, with a policy engine that understands the API.

The practical reason to keep the distinction is that an API gateway’s extra jobs are where rate limits, authentication, and request transformation live, which is precisely the surface a client interacts with whether the client is a browser, a mobile app, or a crawler. When a request gets a 429 and a backoff hint, that decision was almost certainly made at an API gateway or a reverse-proxy layer, not at the application code behind it. Knowing which hop made a decision is half of debugging why a request failed.

The intercepting proxy: nobody at the endpoints chose it

The first two roles are honest. The client knows about its forward proxy; the origin operator knows about its reverse proxy. The intercepting proxy is the one that breaks the assumption, because it sits in the path without either endpoint having selected it. Traffic is redirected into it at the network layer. The client believes it has a direct connection to the origin. It does not.

The redirection is done below HTTP, which is why the client never gets a vote. A router or firewall on the path grabs outbound packets, classically TCP port 80, and steers them into the proxy using NAT, or Cisco’s WCCP, or Linux’s TPROXY. WCCP can hand traffic to the cache using GRE tunnelling at layer 3 or MAC rewriting at layer 2. TPROXY performs interception at layer 3 while spoofing the outbound source address, which lets the proxy fetch on the client’s behalf while hiding its own IP from the rest of the network. The client’s TCP stack thinks it opened a connection to the origin. In reality the SYN landed on a proxy that answered in the origin’s place. There is no PAC file, no CONNECT, no configuration on the endpoint at all. RFC 9110 declines to call this a “proxy” in its strict sense for exactly this reason: the polite client-chosen contract is absent.

Plenty of intercepting proxies are benign. A coffee-shop captive portal that bounces your first HTTP request to a login page is an intercepting proxy. An ISP transparent cache that serves popular objects from a local box is one. They work on plaintext HTTP because plaintext is readable by anyone on the path. The interesting and contentious case is the one that wants to read HTTPS, because HTTPS was built specifically to stop intermediaries from doing that.

SSL inspection: interception that reads the encrypted traffic

To read TLS it does not own keys for, an intercepting proxy has to become a man in the middle in the literal cryptographic sense. It terminates the client’s TLS connection itself, decrypts, inspects, then opens a second TLS connection onward to the real origin and re-encrypts. For the client’s browser not to throw a certificate error, the proxy must present a certificate that chains to a CA the client already trusts. So the operator generates a private CA, installs that CA’s root certificate into the trust store of every managed device, and feeds the private key to the proxy. Now the proxy can mint a valid-looking certificate for any hostname on demand, and the browser accepts it because the chain terminates in a root the device was told to trust.

The reference implementation in the open-source world is Squid’s SSL-Bump. Squid is handed a CA certificate and its private key; when it sees a CONNECT or a transparently redirected TLS flow, it generates a server certificate for the requested domain on the fly, terminates the client side, and can then run the decrypted traffic through its normal filtering, including ICAP and eCAP content-adaptation hooks, before re-encrypting to origin. This is how a corporate web gateway enforces data-loss-prevention rules or malware scanning on traffic that is, from the user’s browser’s point of view, encrypted end to end. It is also, mechanically, the exact thing TLS was designed to prevent. The only reason it does not trip every alarm in the browser is that someone put the inspection CA in the trust store first. The Squid project documents this with an explicit warning that decrypting users’ HTTPS without consent may be unlawful depending on jurisdiction.

Client trusts enterprise CA Intercepting proxy decrypts, inspects, re-encrypts Origin TLS, minted cert looks valid to browser TLS, real cert cleartext exists inside the proxy; the browser shows a padlock anyway *SSL inspection works only because the enterprise CA was installed in the client trust store first. Remove that step and the browser refuses the minted certificate.*

This is also why the encrypted-SNI and ECH work matters to the intercepting-proxy story. Classic interception leans on the cleartext server name an intercepting box can read to decide what to do with a flow, even when it is not decrypting. ECH encrypts that name. A network that wants to keep filtering by hostname after ECH ships has to either decrypt fully, which means it is already an SSL-inspection MITM with a CA in the trust store, or fall back to coarser signals. The intercepting proxy’s job gets harder as more of the connection’s metadata gets encrypted, which is the direction the whole protocol stack has been moving.

A corporate proxy can be two species at once

The phrase “corporate proxy” does not pick out one box on this taxonomy, and that is the source of half the confusion. The same enterprise deployment is often a forward proxy and an intercepting proxy depending on the client.

A managed laptop, configured by IT with a PAC file or explicit proxy settings, uses the gateway as a textbook forward proxy. The laptop knows the proxy is there; it sends requests to it; it issues CONNECT for HTTPS. That is the client choosing its agent. An unmanaged device on the same corporate network, a personal phone on the guest WLAN, has no such configuration, so the network transparently redirects its port-80 and port-443 traffic into the same box, which now acts as an intercepting proxy toward that device. One physical appliance, two roles, decided per client by whether that client was configured to use it.

Layer the SSL-inspection capability on top and the appliance is also a TLS-terminating MITM for any device that trusts the corporate CA. So a single “corporate proxy” can be, simultaneously, a forward proxy for managed laptops, an intercepting proxy for guests, and a decrypting man in the middle for both. The taxonomy still holds. It just applies per flow, not per box. When someone says “we have a proxy,” the only useful follow-up is: which clients chose it, and what does it do to TLS.

This is the same reason the question “is a CDN a forward or reverse proxy” has a clean answer while “is our corporate proxy a forward or reverse proxy” does not. A CDN faces one direction for everyone. An enterprise gateway faces different directions for different clients on the same network.

The header machinery that makes hops admit they exist

Every honest intermediary leaves traces, and those traces are the practical way to reverse-engineer a proxy chain from outside it. The traces live in HTTP headers, and three of them carry most of the load.

Via is the spec’s own breadcrumb. RFC 9110 defines it as the header each forwarding proxy or gateway appends to, recording the protocol version it received and an identifier for itself. Walk the Via chain on a response and you can read the list of intermediaries that touched it, in order, which is exactly why it doubles as a loop-detection mechanism: a proxy that finds its own token already in Via knows the request came back around. The Forwarded header and its older X-Forwarded-For, X-Forwarded-Proto, and X-Forwarded-Host siblings carry the client-identity information a terminating reverse proxy would otherwise erase. Each hop appends the address it received the connection from, building a comma-separated chain that, read left to right, names the client and then each proxy in turn. RFC 7239 is explicit that this exposes information some users consider private, by design, and that it should never be reflected back to the client, because doing so would hand the client a map of the entire internal proxy chain.

There is an asymmetry in those two that is worth stating plainly. Via is meant to be honest and is appended by the intermediary about itself, so it can be trusted as far as the hop that wrote it can be trusted. X-Forwarded-For is a claim about someone else, the previous hop, and a claim is forgeable. Any client can send an X-Forwarded-For header with a made-up address, and a naive backend that trusts the leftmost value will believe a lie. This is why the only X-Forwarded-For value an edge can trust is the one it wrote itself, counting hops inward from its own trusted boundary, and why the whole X-Forwarded-For trust problem is harder than the header’s simple syntax suggests. A forward proxy hides the client and may decline to add the header at all, which is the entire point of an anonymizing forward proxy. A reverse proxy you control adds it truthfully. A header you received from outside your trust boundary is a rumour, not a fact.

Where this leaves you

The taxonomy is small, which is the point. Three roles in the HTTP spec, sorted by a single question, and every box a vendor sells is one of those three with extra features. Forward proxy, chosen by the client, hides the client, drops into a blind tunnel for HTTPS it cannot read. Reverse proxy, chosen by the origin operator, pretends to be the origin, terminates and re-emits, and owes the backend an honest account of who the client was. Intercepting proxy, chosen by neither, grabbed off the wire below HTTP, and capable of reading even TLS if it can get its CA into your trust store. A CDN is the reverse proxy at planetary scale. An API gateway is the reverse proxy that reads the body. A corporate gateway is whichever of these it needs to be for the client in front of it.

The reason to keep the lines sharp is operational, not pedantic. The direction a proxy faces tells you whose IP the far end logs, whose certificate your client validated, who is holding your plaintext at the point of decryption, and which header values you are allowed to believe. A forward proxy and a reverse proxy can be the same nginx binary with a different config file, and the binary does not care which one it is. You have to. The day you trust an X-Forwarded-For from outside your boundary, or assume the padlock means nobody in the middle read the request, is the day the distinction stops being academic. The proxy that grabbed your packets off the wire and minted a certificate your browser accepted is, by every measure your application can see, the server you meant to reach.


Sources & further reading

  • IETF (2022), RFC 9110: HTTP Semantics — section 3.7 defines proxy, gateway (reverse proxy), and tunnel, and describes the interception proxy as not chosen by the client.
  • IETF (2014), RFC 7239: Forwarded HTTP Extension — standardises the Forwarded header that replaces the X-Forwarded-For family, with explicit privacy notes.
  • MDN Web Docs, CONNECT request method — how a client asks a forward proxy to open a blind TCP tunnel for HTTPS.
  • NGINX, NGINX Reverse Proxy admin guide — the proxy_pass mechanism and the default Host/Connection header rewrites a reverse proxy applies.
  • NGINX, HTTP CONNECT forward proxy — configuring nginx as a forward proxy via the CONNECT method.
  • Squid Web Cache wiki, Feature: Squid-in-the-middle SSL Bump — how an intercepting proxy decrypts TLS by minting certificates from an installed CA, with the legal warning.
  • Wikipedia, Proxy server — NAT, WCCP, GRE tunnelling, and TPROXY as transparent-interception mechanisms.
  • Wikipedia, Proxy auto-config — the FindProxyForURL PAC function and WPAD discovery via DHCP and DNS.
  • Cloudflare (2019), Preventing request loops using CDN-Loop — why Via was insufficient for multi-CDN chains and how the CDN-Loop header was introduced.
  • Kong Inc., API Gateway vs API Proxy — the reverse-proxy lineage of API gateways and the policy layer added on top.

Further reading