Skip to content

How a CDN actually works: anycast, POPs, and the cache hierarchy

· 22 min read
Copyright: MIT
The letters CDN as a large monospace wordmark with a single orange arrow routing through three stacked cache tiers

Type a domain into a browser, press enter, and the bytes that come back almost certainly did not travel from the company that owns the site. They came from a machine a few milliseconds away, owned by a third party, holding a copy of someone else’s content and presenting someone else’s certificate. That machine decided, in the first few hundred microseconds of the connection, whether it already had what you asked for or whether it had to go fetch it from somewhere further away. None of this is visible from the address bar. The URL looks the same whether the response came from cache in the same city or from an origin server on another continent.

That gap, between what the URL implies and what actually served the bytes, is the whole subject of this post. A content delivery network is a layer of borrowed machines between you and an origin, and almost every interesting decision it makes is invisible by design. The question worth answering carefully is how the layer is built: how your packets find a nearby machine at all, what that machine does with them, and how it decides whether your request is the same as the last person’s.

Here is the route. First, anycast and BGP, the routing trick that lets thousands of machines share one IP address and pulls your connection to a nearby one. Then the point of presence itself, the cluster of cache servers that machine belongs to, and how a request moves around inside it. Then the cache hierarchy, edge to shield to origin, and why a CDN deliberately funnels misses through a small number of machines. Then cache keys, the rule that decides what counts as the same request, which is where caching gets subtle and where it gets dangerous. Finally TLS termination at the edge, including the awkward problem of presenting a certificate for a domain whose private key you would rather not hand over.

Anycast: one address, many machines

Start with the addressing, because it is the part that feels like a violation of how IP is supposed to work. Normally an IP address belongs to one machine in one place. That is unicast. Send a packet to that address and the internet’s routing tables, built from the Border Gateway Protocol, carry it toward the one place that announced ownership of that address block.

Anycast breaks the one-address-one-place assumption on purpose. The same IP prefix is announced into BGP from dozens or hundreds of locations at once. Every point of presence stands up and tells its upstream networks, in effect, “traffic for this prefix, send it here.” The routers of the internet do what they always do: they pick the shortest path. Because many locations advertise the same destination, “shortest” resolves to whichever announcement is closest in BGP terms to the network the packet came from. A user in Frankfurt and a user in São Paulo send packets to the identical destination address and land on completely different machines, each near them, with no per-user logic anywhere deciding that. The routing fabric does the steering for free.

This is the routing model Cloudflare, Fastly, and several others build on. The alternative, used historically by Akamai among others, is DNS-based steering: the CDN runs authoritative DNS for the hostname and hands each resolver a different unicast IP depending on where it thinks the resolver is. DNS steering gives finer control over which server answers, but it inherits the resolver’s blind spots. If you use a corporate DNS server two countries away, or a public resolver, the CDN sees the resolver’s location, not yours, and can misroute you. Anycast routes on the actual network topology of the packet, so it sidesteps that particular failure. Most large networks now run a hybrid, anycast at the edge for geographic distribution with DNS policy layered on for compliance and regional control.

Same destination: 192.0.2.1 (anycast) Frankfurt user São Paulo user BGP picks shortest path POP: Frankfurt announces 192.0.2.1 POP: São Paulo announces 192.0.2.1 *Two users send packets to the identical anycast IP; BGP delivers each to the topologically nearest point of presence with no per-user logic.*

Anycast buys two things beyond latency. The first is failure handling that needs no failover machinery. A point of presence that goes unhealthy stops announcing the prefix, or has its announcement withdrawn, and within the time it takes BGP to reconverge the rest of the internet stops sending it traffic. Routes shift to the next-closest announcement. There is no health-check-driven DNS update to wait on, no TTL to expire; the routing layer itself is the failover. The second is volumetric attack absorption. A denial-of-service flood aimed at a single anycast IP does not converge on one machine. Each point of presence announcing that address absorbs only the share of the flood that originates near it, so an attack that would flatten a single server gets split across the whole footprint. Cloudflare’s early write-ups made exactly this point: a distributed botnet has “a portion of its denial of service traffic absorbed by each of our data centers.”

The cost of anycast is the cost of statelessness. BGP routes packets, and BGP knows nothing about TCP connections. If the network reconverges mid-connection, perhaps a transit link flaps, perhaps an operator changes a route, the packets from an established TCP session can suddenly start landing on a different point of presence that has never heard of that connection. The new machine has no record of the session and resets it. For a short HTTP request this is invisible; the client just retries. For a long download or a websocket it is a dropped connection. Operators spend real effort tuning announcements so that traffic does not “flap between multiple locations,” because every flap is a handful of broken sessions. This is the quiet tax of the model, and it is why anycast suits stateless request/response traffic far better than long-lived streams.

There is a second, subtler cost: anycast does not give you fine control over where a given user lands. BGP picks the shortest path by its own metrics, which are about autonomous-system hops and local routing policy, not about which point of presence is least loaded or has the warmest cache. A user can be routed to a POP that is geographically further than another simply because the network path to it is shorter in BGP terms, and there is no clean per-request knob to override that without bending the announcement strategy. Large operators compensate with capacity, putting enough POPs in enough places that the nearest one is almost always close enough, and by occasionally withdrawing or de-preferencing announcements at a congested site so traffic naturally drains to neighbours. But the fundamental property holds: with anycast you steer at the granularity of a BGP announcement, not at the granularity of a request. DNS-based steering buys back that granularity at the cost of trusting the resolver’s location. The two models trade the same coin from opposite sides.

Inside a point of presence

Anycast gets your packet to a point of presence. A point of presence, a POP, is not a single server. Fastly describes it precisely as “a grouping of cache servers that creates a single cluster of cache storage,” placed near the highest-density internet exchange points so that the CDN sits one short hop from as many networks as possible. The unit you reach is the cluster, not the box.

That distinction matters for cache efficiency. If a POP has a hundred cache servers and each kept its own independent cache, a popular object would be fetched from origin up to a hundred times, once per server that happened to receive a request for it before any of its neighbours had a copy. So the servers inside a POP share. Fastly notes that requests handled by “100 distinct servers all participating in the same POP” get a higher hit ratio precisely because they “have access to the same shared pool of cache storage.” A request that lands on any server in the cluster can be satisfied from a copy held by any other. The internal routing that makes one object live on a predictable server, rather than scattered randomly, is generally some form of consistent hashing across the cluster, though the exact internal mechanism a given vendor uses is not fully public.

The largest metros complicate the picture. Fastly calls its biggest deployments “metro POPs,” where a single logical POP spans multiple physical sites in one densely populated area. A single request, it notes, “may be processed by servers in more than one site, even while remaining within the same POP,” because “transit between servers in different sites within the same POP is extremely fast.” So the clean mental model, one POP equals one room of machines, holds for most locations and bends for the biggest ones. What stays true everywhere is the shared cache: the POP, however many buildings it occupies, presents one pool of storage to the requests that reach it.

The cache hierarchy: edge, shield, origin

Now the part that does the actual work. A single layer of edge caches has a structural problem that gets worse the bigger the network grows. Suppose a CDN has 300 locations and an object is not yet cached anywhere. The first user to request it in each location triggers a fetch from origin. That is up to 300 separate requests hitting the origin for one object, one per location, before any of them holds a copy. The network’s own size becomes a liability: more edges means more independent first-misses hammering the origin every time content expires or is purged. This is sometimes called the thundering-herd or cache-stampede problem, and a flat edge makes it strictly worse.

Tiered caching fixes it by putting a layer between the edges and the origin. Cloudflare organizes its data centers into “a hierarchy of lower-tiers and upper-tiers.” Lower tiers are the edges closest to visitors. When an edge misses, it does not go to the origin. It asks an upper tier. The rule that makes the whole thing work is the access restriction Cloudflare states plainly: “only the upper-tier can ask the origin for content.” A miss at the edge becomes a request to a shield, and only the shield is allowed to talk to your servers. Now the origin sees a request count proportional to the number of upper tiers, not the number of edges. Cloudflare reported customers seeing “60% or greater reduction in their cache miss rate” against a flat topology once tiered caching was on.

Lower tier (edge POPs) edge A edge B edge C edge D miss → ask shield Upper tier (shield) only tier that reaches origin Origin server *A cache miss at any edge becomes a request to the shield; only the shield contacts the origin, so origin load scales with the number of upper tiers, not the number of edges.*

There is a configuration question hiding here: which upper tier should an edge use? Cloudflare offers two answers. The generic version turns every Cloudflare data center into a potential upper tier, which maximizes redundancy. Smart Tiered Cache instead “dynamically selects the single closest upper tier for each of your website’s origins,” measuring latency from each candidate upper tier to your origin and picking the best-connected one. The trade is the usual one between spreading the load broadly and concentrating it on the machine with the best path to your servers. A regional layer can sit in between, checking a regional hub near the lower tier before reaching for an upper tier that might be on another continent.

The hierarchy does not have to stop at one shield. Cloudflare’s Cache Reserve extends the idea downward into persistent storage: it “serves as the ultimate upper-tier cache,” a backing store that holds assets far longer than the volatile edge caches do, so that even a cold shield can avoid the origin by reading from reserve. The general principle is the same at every level. Each tier exists to keep requests away from the tier below it. The edge protects the shield, the shield protects the origin, and a backing store protects against the shield itself going cold.

This is also where reachability becomes its own problem. When the shield does have to reach the origin, the path between them can be slow or broken in ways neither end controls. Cloudflare’s Orpheus exists for exactly this: rather than always taking the fastest route, it reactively steers origin-bound traffic “down healthy (and not necessarily the fastest) paths when needed,” which it credits with cutting connection failures (the 522 errors that mean the edge could not reach the origin) and nudging origin reachability from 99.87% to 99.90%. A small absolute change that is a large relative one at the tail. Caching keeps most requests away from the origin; routing makes sure the few that go through actually arrive.

If you are building a crawler that talks to CDN-fronted sites, this hierarchy is worth internalizing, because it explains why your own caching strategy mirrors theirs. The same expiry and revalidation primitives the edge uses to decide freshness, ETags and Last-Modified, are the ones an incremental recrawler uses to avoid refetching unchanged pages. That side of the equation, from the client’s perspective, is the subject of caching and incremental recrawl.

Cache keys: what counts as the same request

Every cache rests on one decision made millions of times a second: is this request the same as one I have already answered? The cache key is the answer. It is the string a CDN computes from a request and uses to look up a stored response. Two requests that produce the same key are, as far as the cache is concerned, the same request, and the second one gets the first one’s bytes.

What goes into that key is the entire game. Put too much in and you fragment the cache, storing near-identical copies under slightly different keys and missing constantly. Put too little in and you serve the wrong response to someone. The default on Cloudflare keys on the full URL, which means the scheme, the host, and the path with its query string, plus a handful of headers the platform folds in for correctness: the Origin header for CORS, the method-override headers (x-http-method-override, x-http-method, x-method-override), and a set of forwarding headers (x-forwarded-host, x-host, x-forwarded-scheme, x-original-url, x-rewrite-url, and forwarded). Notice what is absent. Cookies are not in the default key. The User-Agent is not in the default key. By default, two users requesting the same URL get the same cached object regardless of who they are, which is exactly what you want for a static asset and exactly what you do not want for a logged-in dashboard.

Cache key = identity of the request keyed by default scheme + host + path + query Origin, x-forwarded-host, x-rewrite-url ... keyed only if you ask cookie, header, device_type, geo, lang unkeyed (ignored for lookup) User-Agent, most cookies, most custom headers An unkeyed input that still changes the response is the cache-poisoning bug class. *The cache key fixes which parts of a request define its identity; anything unkeyed is invisible to the lookup even if it changes the response.*

So CDNs let you customize the key. Cloudflare’s cache key template can fold in specific query-string parameters (include some, drop others), named request headers, cookie values or merely the presence of a cookie, the resolved versus original host, and a set of user features it computes for you: device_type (mobile, desktop, or tablet), geo (country), and lang (language). Add device_type and a phone and a laptop get separate cached copies of the same URL. Add a session cookie and every logged-in user gets their own cache entry, which usually means you should not be caching that route at all. Two normalizations sit alongside this and matter more than they look. Query-string sort makes ?a=1&b=2 and ?b=2&a=1 collapse to one key instead of two, which on Cloudflare is off by default and has to be turned on. URL normalization to origin keeps the URL used for the key aligned with the URL sent to the origin, and Cloudflare explicitly recommends enabling it with custom keys to prevent a class of poisoning.

That word, poisoning, is the dangerous edge of cache keys, and it is worth being precise about because it is a defensive concern, not an exploit recipe. The cache key defines the identity of a request. Anything not in the key is “unkeyed,” the term James Kettle introduced in his 2018 Practical Web Cache Poisoning research: an unkeyed input is a request component “not included in the cache key.” The bug appears when an unkeyed input still changes the response. If the application reflects, say, an X-Forwarded-Host header into the page but the cache ignores that header when computing the key, then a response generated from a poisoned header gets stored under the clean key and served to everyone who asks for that URL. The mechanism, in Kettle’s words, is that “any difference in the response triggered by an unkeyed input may be stored and served to other users.” The defense is conceptually simple and operationally fiddly: either do not let unkeyed inputs change the response, or key on them. The standard HTTP answer to the second half is the Vary response header, which tells a cache to add named request headers to the key. In practice support is uneven across CDNs and easy to get wrong, which is why platform-side normalization and disciplined cache rules do most of the real work.

There is also the question of what makes a response cacheable in the first place, which is separate from the key. The key answers “is this the same request”; the cache-control and expires headers from the origin answer “may I store this, and for how long.” A response with no caching directives gets treated according to the CDN’s default policy, which for many platforms means static file extensions are cached and everything else is passed through. The origin can be explicit with max-age, s-maxage (which targets shared caches specifically and overrides max-age for them), and directives like no-store or private that tell a shared cache to keep its hands off. The interesting state is stale: a CDN can be told, via stale-while-revalidate and stale-if-error, to keep serving an expired object while it fetches a fresh copy in the background, or to keep serving it if the origin is down. That last behaviour is why a well-configured site can stay up through an origin outage. The edge is holding a copy it is technically not supposed to serve any more, and serving it anyway because the alternative is an error page. Freshness is a spectrum the origin gets to define, and the cache key only ever decides identity within whatever that policy allows.

This is the layer where caching stops being a performance feature and becomes a correctness and security feature. Get the key wrong in the loose direction and you leak one user’s response to another. Get it wrong in the tight direction and you destroy your hit ratio. The whole skill of running a cache is choosing exactly which bytes of the request define identity and refusing to let anything else through.

TLS termination at the edge

There is one more thing the edge does before any of the caching logic runs. It terminates TLS. The encrypted connection from the browser ends at the point of presence, not at the origin. The edge completes the handshake, decrypts the request, does its cache lookup, and if it has to go to origin it opens a separate encrypted connection for that leg. Two TLS sessions, browser-to-edge and edge-to-origin, with the CDN in the clear in the middle. That is the only way the edge can read a URL, compute a cache key, or serve a cached object at all; you cannot cache what you cannot read.

Terminating at the edge is the performance win that makes HTTPS tolerable at a distance. The handshake, with its round trips, happens against a server milliseconds away instead of one across an ocean, which is where the often-cited connection-setup savings come from. The mechanics of that handshake, the ClientHello and its extensions and the rest of the exchange, are their own deep topic covered in the TLS 1.3 handshake, frame by frame. What matters here is the consequence: to present a valid certificate for the site, the edge needs the cryptographic ability to prove it owns the site’s identity, and historically that meant the site’s private key had to live on the CDN’s servers.

For a lot of organizations that is a non-starter. A bank or a government site may be contractually or legally barred from copying its TLS private key onto hardware it does not control. Cloudflare’s answer, proposed publicly in 2014 and shipped as Keyless SSL, splits the handshake so the private key never leaves the customer. The insight is that “the private key is only used once in each handshake,” which lets the design “split the TLS handshake geographically, with most of the handshake happening at Cloudflare’s edge while moving the private key operations to a remote key server” that the customer runs. When the one private-key operation comes up, the edge makes a call to that key server, gets the result, and continues. In an RSA handshake the delegated step is decrypting the pre-main secret the client sent; in an ephemeral Diffie-Hellman handshake it is signing the server’s key-exchange parameters. The edge holds the certificate and drives the whole exchange. It just borrows the one private-key operation from a server the customer owns, over a mutually authenticated TLS connection back to the key server.

browser TLS handshake CDN edge holds the certificate one key op key server customer-owned private key never leaves *Keyless SSL: the edge presents the certificate and drives the handshake, calling back to a customer-controlled key server only for the single operation that needs the private key.*

Termination at the edge also makes the edge the place where the whole TLS fingerprint surface gets observed. Because the handshake ends there, the CDN sees every detail of the ClientHello, the cipher list and its order, the extension layout, the supported groups, and it can use those for the kind of fingerprinting that bot-management products live on. From the client side that is the JA3-to-JA4 lineage of fingerprints, covered in TLS fingerprinting: from ClientHello bytes to JA4; the relevant fact here is just that the edge is the natural observation point, because the edge is where the encrypted connection actually ends. The certificate it presents, in turn, is logged publicly the moment it is issued, which is the world of certificate transparency. The edge sits at the exact seam where the public certificate, the live handshake, and the cache all meet.

What the layer actually is

Strip away the marketing and a CDN is three decisions stacked on top of each other. Routing decides which machine you reach, and anycast makes that decision happen in the network itself, with BGP doing the steering and the same address answering from everywhere. Caching decides whether that machine already has your answer, and the tiered hierarchy exists so that the cost of not having it stays bounded no matter how large the edge grows. Identity decides whether your request is the same as the last one, and the cache key is where that judgment lives, precise enough to be a performance lever and dangerous enough to be a security boundary. TLS termination sits underneath all three, because none of them can happen until the edge can read the bytes, which is why the edge holds the certificate even when it cannot be trusted with the key.

What is striking, once you see the pieces, is how little of it is visible from where you stand as a user. The URL never changes. The padlock looks the same whether the response came from a cache in your city or an origin on another continent, whether the private key sat on the edge or stayed locked in a rack the CDN has never touched. The entire apparatus is built to be invisible, and it mostly succeeds. The one place it shows through is latency: the difference between a few milliseconds and a few hundred is the difference between a hit at a nearby POP and a miss that walked all the way back up the hierarchy to an origin you will never see in the address bar.


Sources & further reading

Frequently asked questions

how does anycast route a user to a nearby CDN server without per-user logic

Anycast announces the same IP prefix into BGP from dozens or hundreds of points of presence at once. Each location tells its upstream networks to send traffic for that prefix to it, and the internet's routers pick the shortest path by their own metrics. Because many locations advertise the same destination, the packet lands on whichever announcement is closest in BGP terms to the source network. A user in Frankfurt and one in São Paulo reach different nearby machines with no logic deciding that per user.

why does anycast break long-lived TCP connections but not short HTTP requests

BGP routes packets and knows nothing about TCP connections. If the network reconverges mid-connection, perhaps from a transit link flapping or an operator changing a route, packets from an established session can start landing on a different point of presence that has no record of that session and resets it. A short HTTP request just gets retried invisibly, but a long download or a websocket sees a dropped connection. This is why anycast suits stateless request/response traffic better than long-lived streams.

why do CDNs funnel cache misses through a shield tier instead of letting every edge hit the origin

A flat layer of edges has a structural problem: if a network has 300 locations and an object is not cached anywhere, the first request in each location triggers a separate origin fetch, so one object can hit the origin up to 300 times. Tiered caching puts an upper tier between edges and origin, and only the upper tier is allowed to contact the origin. The origin then sees a request count proportional to the number of upper tiers rather than the number of edges. Cloudflare reported customers seeing a 60% or greater reduction in cache miss rate once tiered caching was on.

what does a CDN include in its default cache key and what does it leave out

Cloudflare's default key uses the full URL, meaning the scheme, host, and path with its query string, plus a handful of headers folded in for correctness such as the Origin header for CORS, method-override headers, and a set of forwarding headers. Cookies and the User-Agent are not in the default key. So by default two users requesting the same URL get the same cached object regardless of who they are, which is correct for a static asset but wrong for a logged-in dashboard.

how does Keyless SSL let a CDN serve HTTPS without holding the site's private key

Terminating TLS at the edge normally meant the site's private key had to live on the CDN's servers, which is a non-starter for organizations barred from copying it onto hardware they do not control. Keyless SSL relies on the fact that the private key is used only once per handshake. Most of the handshake happens at the edge, which holds the certificate and drives the exchange, but the single private-key operation is delegated to a remote key server the customer runs. The edge calls back over a mutually authenticated connection, gets the result, and continues.

Further reading