Anycast routing: how one IP serves the whole planet

Type dig @1.1.1.1 from Tokyo, Frankfurt, and São Paulo, and all three queries reach an address that, on paper, is one IP. None of those three machines is the same box. They might be ten thousand kilometers apart. Yet each query gets answered in single-digit milliseconds by something nearby, and nobody configured your resolver with a list of regional addresses. There is exactly one number, 1.1.1.1, and it works from everywhere, fast, at the same time.

That is the trick worth understanding. The address is not a place. It is a claim, made simultaneously by hundreds of separate locations, that they can serve traffic for it. The internet’s routing system hears all those claims and quietly hands each user to whichever claimant is closest in its own terms. This post is about how that works: what an anycast announcement actually is, how BGP turns “everyone advertises the same prefix” into “each user reaches the nearest one,” why DNS roots and CDNs were built on it, what happens to a long-lived TCP connection when the routing underneath it shifts, and where the whole model runs out of road.

The sections below start with the mechanism (announcing one prefix from many places), then the routing math that picks a winner, then the two big deployments that proved it at scale, then failover and DDoS behavior, and finally the limits: TCP state, load distribution, and the fact that “nearest” in BGP terms is often not nearest in any sense a human would recognize.

One prefix, many origins

Start with what a normal IP address is. In the default case, called unicast, an address lives in one place. One machine, or one tightly coupled cluster behind one router, answers for it, and the global routing table contains exactly one origin for the prefix that covers it. Packets aimed at that address converge on that single location no matter where they start.

Anycast breaks the one-to-one assumption. The formal definition, from the IETF’s first write-up of the idea, is “a network addressing and routing methodology in which a single IP address is shared by devices, generally servers, in multiple locations.” The mechanism in IPv4 is unglamorous: you take a prefix, say a /24, and you announce it into BGP from several autonomous-system locations at once. Each location originates the route on its own. From the routing system’s point of view there is no “primary” and no “backup.” There are simply N independent origins all claiming reachability for the same block of addresses, and the routing system treats that as N valid paths to one destination.

The idea is older than most of the infrastructure that depends on it. The first documented use of anycast for topological load-balancing dates to 1989, and the IETF formalized it in November 1993 in RFC 1546, “Host Anycasting Service,” written by Craig Partridge, Trevor Mendez, and Walter Milliken at BBN. RFC 1546 imagined a dedicated anycast address class and never shipped in that form. What actually won was simpler and required no new protocol machinery at all: just announce the same ordinary prefix from many places and let BGP sort it out. By 2001 that approach reached the DNS root with the I-root nameserver, and within a few years it was the default way to run anything that had to be both global and fast.

*Left: a unicast address has one origin, so every user converges on one box. Right: the same anycast prefix is announced from three places; a user takes the path the routing system rates best (solid line) and never sees the others.*

There is a subtle but important point hiding in the word “instance.” Each independent announcement of the prefix is an anycast node. RFC 4786, the operational best-practice document published in December 2006, defines a node as “an internally-connected collection of hosts and routers that together provide service for an anycast Service Address,” and notes that each node “presents a unique path to the Service Address.” The same RFC defines the region that drains to a given node as its catchment: “the topological region of a network within which packets directed at an Anycast Address are routed to one particular node.” That word, catchment, is the right mental model. Each node sits at the bottom of a basin, and every user whose packets roll downhill into that basin becomes its traffic.

How BGP picks the winner

Anycast does not contain any routing intelligence of its own. It borrows all of it from BGP, the protocol that already decides how every prefix on the internet is reached. Nothing about an anycast prefix tells a router “this one is special.” Routers see multiple origins for one prefix, run the exact same best-path selection they run for everything else, and pick one. The packet follows that pick. If you want to understand why a given user lands on a given node, you are really asking why BGP chose one path over the others, which is its own deep topic covered in BGP explained.

The first-order answer is AS-path length: BGP prefers the route that traverses the fewest autonomous systems. Anycast’s classic description, that users reach “the location nearest the sender, using their normal decision-making algorithms, typically the lowest number of BGP network hops,” is accurate but quietly misleading, because BGP’s notion of nearest is topological, not geographic. A path that crosses three networks is preferred over one that crosses four, even if the three-hop path physically loops through a different continent and the four-hop path stays in the same city. Before AS-path length even enters the picture, BGP applies local preference, a knob the network operator sets, which can override everything else. And after AS-path, the tie-breakers cascade through origin type, multi-exit discriminator, eBGP-over-iBGP preference, and finally lowest router ID. Geography appears nowhere in that list. It is correlated with the outcome, often strongly, but it is never an input.

This is why an anycast deployment is mostly traffic engineering. The catchment of each node is whatever the surrounding internet decides it is, and operators shape it indirectly: by choosing where to peer, with whom, at what relative local-preference, and by AS-path prepending to make a node look artificially farther away and shrink its basin. RFC 4786 draws the distinction between local-scope anycast, where a node’s reachability “is only visible to a subset of the whole routing system,” and global-scope anycast, where a node is “potentially visible to the whole routing system.” A common pattern mixes the two: a handful of global nodes that can serve anyone as a fallback, plus many local nodes whose announcements are kept inside one region using BGP communities so they only ever capture nearby traffic. The K-root server has long run that hybrid shape, with a small set of global instances and a larger set of local ones.

The honest caveat is that the field has no view into the exact local-preference values, community strings, or prepend depths a given operator uses; those are private routing policy. What is public is the shape of the design and the failure modes, and those are enough to reason about behavior.

The DNS root: thirteen names, two thousand machines

The cleanest illustration of anycast’s payoff is the DNS root. There are thirteen root server identities, lettered A through M, a number frozen decades ago because thirteen NS records and their glue were the most that fit in a 512-byte DNS response. That ceiling is on names, not machines. Each letter is operated by a different organization: Verisign runs A and J, USC-ISI runs B, Cogent runs C, the University of Maryland runs D, NASA Ames runs E, the Internet Systems Consortium runs F, the U.S. Army and DISA run H and G, Netnod runs I from Sweden, the RIPE NCC runs K from the Netherlands, ICANN runs L, and the WIDE Project runs M from Japan.

Behind those thirteen names is a number that keeps climbing. As of December 2025, the root server system reports 1,954 separate instances worldwide, all reachable through the same thirteen logical addresses. Every one of those instances answers as, say, F-root, by announcing F-root’s prefix into BGP locally. A resolver in Nairobi and a resolver in Reykjavík both send to the same F-root address and both get answered close to home, by different hardware, with neither resolver aware that more than one F-root exists. This is the load-and-resilience story of the entire DNS resolution chain in miniature, and it is why a single 512-byte limit never became a scaling wall. The full path a query takes from a stub resolver up to one of these instances is its own subject, covered in DNS resolution end to end.

Anycast and DNS fit together so neatly for a reason that is structural, not incidental. Root queries are almost always a single UDP packet out and a single UDP packet back. There is no connection state to lose. Each query stands alone, so even if successive queries from the same resolver land on different instances because the routing underneath shifted between them, nothing breaks; each instance can answer any query in full. UDP is anycast’s native habitat. The trouble starts when the transport has memory.

*The root system uses 13 logical names but nearly 2,000 physical instances as of late 2025. UDP's statelessness is what makes the fan-out free of consequences.*

The CDN: every IP, everywhere, over TCP

CDNs took the same prefix-from-everywhere idea and pointed it at the harder transport. Web traffic is TCP, and increasingly QUIC over UDP, and a CDN’s whole job is to answer HTTP from a machine near you. So the modern large CDN announces its entire public address space from every point of presence it runs. Cloudflare describes operating data centers in 335-plus cities, all advertising the same external IPs; an attack it absorbed was spread across 477 data centers in 293 locations by the same mechanism. When you connect to a site behind that network, your packets roll into whichever POP’s catchment you sit in, and that POP terminates your TLS and serves your content. The wider machinery around this, cache tiers, shields, and origin pulls, is the subject of how a CDN actually works.

There is a second, less obvious use of the same idea inside a single building. A POP is not one server; it is a rack of them, and you cannot have a hardware load balancer out front without recreating the single point of failure anycast was supposed to remove. Cloudflare’s answer is to run anycast a second time at LAN scope. Every server in a data center runs the Bird BGP daemon and announces routes for the public IPs to the data center’s routers. The router spreads inbound connections across those servers with equal-cost multipath, hashing the source IP, destination IP, and port so that every packet of one TCP connection consistently lands on the same server. A server that gets overloaded raises its route weight or withdraws the route; the router stops sending it new flows. A server whose process crashes has its routes pulled by Bird automatically, and the connections drain to its neighbors. There is no dedicated load balancer in the path at all. The routing protocol is the load balancer, at both the global scale (which city) and the local scale (which box in the rack).

That ECMP hash is the load-bearing detail for TCP over anycast. Anycast gets your packets to the right POP; ECMP keeps every packet of a given connection pinned to the same server inside it, because the five-tuple hash is stable for the life of the connection. As long as the upstream path and the in-POP hash both stay put, a long-lived TCP session is fine. The model only wobbles when one of those two things moves underneath an open connection.

*Anycast runs twice: BGP picks the POP, ECMP pins the connection to one server inside it. The route is the load balancer at both levels.*

Failover that nobody has to trigger

The property that makes anycast worth the operational pain is failover with no failover logic. There is no health-check loop watching a primary and flipping a virtual IP to a backup. There is no DNS TTL to wait out. When a node goes dark, it simply stops announcing the prefix, either deliberately, by withdrawing the BGP route, or implicitly, because the router that was originating the announcement lost its session to the dead box. BGP propagates the withdrawal, every router that was using that node recomputes best-path, and the affected traffic drains into the next-best node’s catchment. The users in that catchment never chose a backup. The basin they were sitting in just got redrawn, and they rolled into the adjacent one.

The cost of that elegance is convergence time. A withdrawal does not teleport across the internet; it propagates router by router, and a long-haul reconvergence on the public internet typically settles on the order of tens of seconds, not milliseconds. For stateless UDP that gap is nearly invisible, since the resolver just retries and the retry lands somewhere healthy. For an open TCP connection that gap is where things can break, which is the subject of the next section. The same convergence mechanics, and the ways they go wrong, are the whole story of BGP convergence and the routing table.

The same redrawing-of-basins behavior is what makes anycast a structural defense against volumetric DDoS, not just a latency optimization. A botnet firing at one anycast address does not concentrate on one machine. Each bot’s packets follow its own catchment to its own nearest node, so a globally distributed flood arrives pre-split across the entire network’s footprint in rough proportion to where the attackers are. Cloudflare’s framing is that anycast “increases the surface area” of the target: the attack is divided by the number of POPs before any single POP has to cope with its slice. And if one POP is still being overwhelmed by a regional flood, the operator can withdraw that POP’s announcement from specific peers and push the load onto better-provisioned neighbors. The defense and the load-balancing are the same primitive used two ways. This is one of the reasons a single advertised IP behind a large network is hard to take down by brute force, in a way a single unicast box never could be.

Where the model breaks: TCP state

Now the limit the angle promised. Anycast routes packets to a node, but it makes no promise that two packets sent a few seconds apart reach the same node. A TCP connection is shared state: a sequence-number space, a window, a congestion-control history, all of it living in memory on one specific machine. If the routing underneath an open connection shifts and subsequent packets arrive at a different node, that node has never heard of the connection. It has no matching socket. It answers the way TCP says to answer a segment for an unknown connection: a reset. The connection dies.

There are two distinct ways this can happen, and they have different blast radii. The first is a mid-connection route change, a flap. BGP can withdraw and re-announce a route, and if the best path shifts mid-session, packets that used to roll into node A now roll into node B, which resets them. The second is sharper and does not require any instability at all: per-packet load balancing. If a router on the path splits a single flow across two equal-cost paths packet by packet rather than per-flow, then the SYN can reach one node and the first data segment reach another. The second node sees a data segment with no handshake behind it and sends a reset. As one practitioner write-up put it, “a TCP SYN packet might go to one server, but an HTTP GET request might go to another server, which would send a TCP reset and cause the connection to drop.”

*The failure that defines anycast's edge: shared TCP state lives on one node, and a packet that arrives at a node which never saw the handshake gets a reset.*

For years this was the received wisdom that anycast was for UDP and stateless services, and TCP was asking for trouble. The measurements did not bear out the fear. The dominant behavior on the modern internet is per-flow load balancing, where every packet of a TCP connection follows one path, precisely so that flows are not torn apart; only a small minority of routers still do per-packet splitting. And real anycast routes turn out to be far more stable than the worst case suggests. A Catchpoint study that ran minute-long TCP transfers over anycast for a week reported that the testers “did not notice any substantial instability problems,” with the honest caveat that the tests used synthetic agents rather than real users. Earlier root-server measurements put mid-connection failure rates below 0.017 percent, low enough that route changes are a smaller cause of TCP failure than ordinary packet loss. This is exactly the regime RFC 4786 described when it advised that “the routing system’s node selection decision ought to be stable for substantially longer than the expected transaction time”: short transactions over stable routes are safe, and most web transactions are short.

QUIC closes the gap further by design. A QUIC connection is identified by a connection ID carried in the packet rather than by the IP five-tuple, so a connection can survive its underlying path changing, which is exactly the migration anycast occasionally forces. The connection ID lets a server, or a load balancer in front of one, recognize and route a migrated flow back to the state that owns it instead of resetting it. The transport that the modern web is moving toward is structurally friendlier to anycast than the one it grew up on. Even so, RFC 4786’s warning still stands for genuinely long-lived flows: “for long running flows, there are potential failure modes using anycast that are more complex than a simple destination-unreachable failure.” A multi-hour download or a persistent streaming socket is exactly the case where a route change has time to happen, and that is why state-heavy or very long sessions are often deliberately steered off pure anycast and onto a stable unicast address once established.

Where the model strains: load and the meaning of “nearest”

Two softer limits matter as much as the hard TCP one in practice.

The first is that anycast hands you no control over how much traffic each node receives. The catchment of a node is set by the surrounding internet’s topology, not by the node’s capacity. A POP sitting near a dense peering hub can find itself the nearest node, in BGP terms, for a vastly larger population than its neighbors, and BGP has no idea that the box is filling up. Plain anycast routes by topology and is blind to load. The production answer is to make the announcement itself load-aware: when a POP approaches saturation it sheds traffic by manipulating its own routes, prepending its AS-path to look farther away or withdrawing the announcement from some peers so its basin shrinks and the overflow rolls into adjacent nodes. The research line here is FastRoute, presented at USENIX NSDI in 2015 and deployed on Microsoft’s CDN, which automated exactly that BGP-based load shedding so an overloaded POP pushes its excess to the next-nearest one without a central controller. Cloudflare’s in-POP version of the same idea is a server raising its BGP weight or withdrawing a route the moment it is busy. The pattern is identical at every scale: the thing that picks the node is also the thing that throttles it, by editing what it announces.

The second soft limit is that “nearest” is a lie the marketing copy tells. BGP’s nearest is the fewest autonomous systems, and AS-path length and physical distance are only loosely related. A user can be handed to a node that is topologically close but geographically absurd, because the short AS-path happens to wind through a distant interconnect. Measurement studies have repeatedly found clients, mobile clients especially, mapped to a geographically suboptimal anycast replica with real latency cost, and not because a closer replica was missing. The closer node exists; BGP just does not prefer the path to it. LinkedIn’s published experience is a clean number on the scale of the problem: after moving the United States to a regional anycast design, its share of “suboptimal POP assignment” fell from 31 percent to 10 percent, an improvement that also says one in three assignments had been suboptimal under the naive global scheme, and one in ten still was after the fix. This is the structural reason CDNs do not rely on anycast alone for steering. They pair it with DNS-based steering, which can use the resolver’s location, and increasingly the EDNS Client Subnet of the actual user, to choose a POP with information BGP does not have. How that DNS-side steering works, and where it disagrees with anycast, is the subject of DNS load balancing and GeoDNS.

What anycast actually buys

Anycast is one prefix announced from many places, and a borrowed routing decision that hands each user to a nearby announcer. That is the entire idea. Everything else is consequence. The statelessness of UDP is what made it free for DNS, which is why the root system runs nearly two thousand machines behind thirteen names without anyone noticing the seam. The reintroduction of state is what made it hard for the web, which is why the TCP failure modes were feared for years before careful measurement showed them to be rare and modern transports were built to route around them. And the fact that BGP cannot see load or geography is why a real CDN never leaves anycast to work alone, but wraps it in DNS steering and load-aware route withdrawal to compensate for the two things the routing decision is blind to.

The deepest property is the one that is easiest to miss because it requires no machinery. There is no failover controller, no health-check daemon flipping a virtual IP, no orchestration deciding who is primary. A node that dies stops talking, the routing system notices a path went away, and the basin of traffic that used to drain into it quietly drains somewhere else instead. The same mechanism that load-balances also fails over also absorbs attacks, because all three are the same act, redrawing which announcement wins where, performed by a protocol that was never designed with any of those three goals in mind. Anycast is what you get when you point BGP’s ordinary, geography-blind path selection at a deliberately ambiguous destination and let the ambiguity do the work.

Sources & further reading

Partridge, Mendez & Milliken (1993), RFC 1546: Host Anycasting Service — the original IETF write-up that named and defined anycast.
Abley & Lindqvist (2006), RFC 4786: Operation of Anycast Services (BCP 126) — the operational bible: nodes, catchments, local vs global scope, and the warnings about long-lived flows.
Wikipedia (2025), Anycast — history from 1989, the relevant RFC chain, and the early sub-0.017% mid-connection failure measurement.
Wikipedia (2025), Root name server — the thirteen letters, their operators, and the December 2025 instance count of 1,954.
Cloudflare (n.d.), A Brief Anycast Primer — how Cloudflare announces one prefix from many sites and why taking a data center offline self-heals.
Cloudflare (n.d.), Load Balancing without Load Balancers — the in-POP anycast model: Bird, ECMP five-tuple hashing, and route withdrawal as load shedding.
Cloudflare (n.d.), What is an Anycast Network? — the DDoS surface-area argument and the nearest-healthy-POP framing.
Catchpoint (n.d.), TCP over IP Anycast — Pipe Dream or Reality? — the per-packet load-balancing reset failure and a week of synthetic TCP-over-anycast stability tests.
LinkedIn Engineering (n.d.), TCP over IP Anycast — Pipe Dream or Reality? — the regional-anycast result: suboptimal POP assignment falling from 31% to 10% in the US.
Flavel et al. (2015), FastRoute: A Scalable Load-Aware Anycast Routing Architecture for Modern CDNs — USENIX NSDI paper on automating BGP-based load shedding for an anycast CDN, deployed on Microsoft’s network.
ThousandEyes (n.d.), What is Anycast IP Addressing? — a clear walkthrough of BGP best-path selection applied to anycast prefixes.