Skip to content

How DNS load balancing and GeoDNS steer traffic

· 22 min read
Copyright: MIT
GeoDNS wordmark with an orange arrow steering a query toward a region

A DNS server has one job that looks trivial from the outside. A client asks for the address of www.example.com, and the server hands back an IP. The trick is that nothing in the protocol says the answer has to be the same every time, or the same for everyone. Change which IP you return, and you have changed where the traffic goes, without touching a single packet of the actual request. That is the whole idea behind DNS load balancing and GeoDNS. The name resolves to whatever you want it to resolve to, and you can vary that answer by who is asking, where they are, which of your servers is alive, and how you have decided to split the load.

It is a tempting place to put steering logic because it sits in front of everything. The browser has to resolve the name before it opens a socket, so the DNS answer is the first decision in the chain. It is also a frustrating place, because DNS was built to be cached aggressively and the resolver between you and the user is a black box you do not control. You can suggest where traffic should go. You cannot force it. This post is about how the suggestion works, how far it reaches, and where it breaks.

The sections below walk through the mechanism in order. First the oldest trick, address rotation and round-robin, and why it distributes load only loosely. Then weighting, which lets you bias the split deliberately. Then the location-aware answers: geolocation, geoproximity, and latency-based routing, and the EDNS Client Subnet extension that makes them work despite the resolver sitting in the wrong place. Then health checks, which turn a static record into something that fails over. And finally the part everyone underestimates: TTL, caching, and client resolver behaviour, the three things that quietly determine whether any of your careful steering actually lands.

Round-robin: the original DNS load balancer

The simplest way to spread load through DNS is to publish more than one address record for a name and let the answer rotate. Ask for www.example.com, get back three A records. Ask again, get the same three in a different order. A naive client takes the first one in the list, so if the order changes per query, connections fan out across the set. No load balancer, no extra hardware, just multiple records and a server that shuffles them.

This is older than most of the people running it. The mechanism was written down in 1995, in RFC 1794, DNS Support for Load Balancing, by Thomas Brisco at Rutgers. The RFC is worth reading because it is honest about what it is doing. Brisco’s implementation did not change the DNS protocol at all. It used what he called “volatile subzones”, autonomous secondary nameservers that reorder the address records on a timer, with an agent rotating the addresses “a random number of times” and rewriting the zone file every five minutes. The protocol never learned about load balancing. People just learned to abuse the ordering of the answer section.

Authoritative zone: www.example.com A 192.0.2.10 A 192.0.2.11 A 192.0.2.12 query 1 → .10 .11 .12 client picks .10 query 2 → .11 .12 .10 client picks .11 query 3 → .12 .10 .11 client picks .12 the record set is identical; only the order rotates, and most clients take the first entry *Round-robin returns the same A records in a rotated order so that clients defaulting to the first address spread across the set.*

The catch lives in that phrase “a naive client takes the first one.” Modern clients are not naive, and that is precisely the problem. Read RFC 6724 and RFC 8305, and the client is supposed to do something smarter than grab the head of the list. RFC 6724 defines destination address selection, the rules for ordering candidate addresses. RFC 8305, Happy Eyeballs Version 2: Better Connectivity Using Concurrency, from December 2017, tells the client to race connection attempts across the returned addresses, interleaving IPv4 and IPv6 and starting each new attempt after a short delay. The recommended delay between attempts is 250 milliseconds. The first socket that connects wins; the rest are cancelled.

That sounds great for the user and terrible for your load distribution. If clients race all the addresses and keep whichever answers fastest, your careful rotation no longer decides anything. The fastest server wins every race, and your rotation is just noise on top of latency. In practice the behaviour is all over the map, and a 2024 set of experiments capturing real browser behaviour found exactly that. Chrome and Firefox pick “somewhat randomly between all locations, but once selected, it sticks with it”, re-evaluating only after hours. Safari consistently picks the closest server and re-selects within seconds when one goes offline. curl finds the nearest server on a retry. There is no single round-robin behaviour to design against, because every client stack resolves the same record set differently.

So round-robin distributes load, loosely. It is fine when your back ends are interchangeable and you only need approximate spreading. It is a poor primitive the moment you care about the exact split, or about a dead server.

Weighted answers: biasing the split on purpose

If plain rotation gives you roughly even spreading, weighting lets you choose the ratio. The idea is to return records in proportions you set rather than uniformly. Send 90 percent of resolvers to the address of your main cluster and 10 percent to a canary, and you have a blue-green or A/B split decided entirely in DNS. AWS Route 53 calls this a weighted routing policy and describes it as routing “traffic to multiple resources in proportions that you specify.” NS1, Cloudflare, and most managed DNS providers expose the same primitive under their own names.

The weighting is not enforced by handing out a fractional record. It works statistically. The authoritative server holds a weight per record and, for each query, picks which record (or which ordering) to return so that over a large number of queries the distribution approaches the configured ratio. One resolver gets one answer; the proportions only emerge in aggregate. This matters because the unit of distribution is not the user. It is the query, and more precisely the cache entry behind the query, which we will come back to.

Weighting is also how a lot of gradual rollouts and drains happen without any orchestration layer. Want to take a data centre out of service? Drop its weight to zero and let existing cache entries age out. Want to bring a new region online slowly? Start it at weight 5 against everything else at 95 and watch the error rate. The granularity is coarse and the timing is governed by TTL, but the mechanism costs nothing and touches no application code.

There is a sibling primitive worth naming here because people reach for it expecting weighting and get something subtly different. Route 53’s multivalue answer policy returns up to eight records, chosen at random from the set currently passing health checks, and leaves the final choice to the client. It is round-robin with a health check bolted on, not a load balancer. The distinction matters: multivalue gives the resolver a menu and lets client behaviour decide, so all the Happy Eyeballs racing and per-stack quirks from the previous section apply in full. Weighting, by contrast, tries to control the menu itself. Neither one places a request on a server. Both place an answer in a cache and hope the client cooperates. If you need a specific server to get a specific request, DNS is the wrong layer, and no amount of weighting fixes that, because the thing you are weighting is the contents of a cache and not the flow of traffic.

Location-aware answers: geo, geoproximity, latency

The next step up is to vary the answer by where the client is. This is the family people mean when they say GeoDNS. The same name returns different IPs depending on the geographic or network location attributed to the request, so a user in Frankfurt gets your European address and a user in São Paulo gets your Brazilian one. Done well it cuts round-trip latency by routing each user to a nearby endpoint. Vendor numbers for the improvement vary and tend to be self-reported, so treat the specific percentages with suspicion, but the direction is not in dispute: shorter network paths mean lower latency.

There are three distinct flavours here that often get lumped together, and Route 53’s policy names happen to separate them cleanly. Geolocation routing keys off the location attributed to the user and returns a record you have explicitly mapped to a continent, country, or US state. It is editorial. You decide that European users go to the EU record, full stop, regardless of whether that is the lowest-latency choice. This is the one you use for content licensing or data-residency rules, where the requirement is legal rather than performance.

Geoproximity routing instead computes geographic distance between the user and each of your endpoints and picks the closest, with a configurable bias that lets you expand or shrink a region’s pull. Turn the bias up on one region and it attracts users from farther away, which is how you bleed traffic from one site to another gradually. Route 53 requires its Traffic Flow feature for geoproximity. Latency-based routing is the third flavour and the most often misunderstood. It does not measure live latency from the user at query time. It returns the endpoint in the AWS Region that historically has the lowest latency to the resolver’s network, based on latency measurements AWS collects over time. Geographic distance and network latency are not the same thing, which is the entire reason latency routing exists as a separate policy. A user can be physically close to a data centre and several network hops away from it.

Three location-aware policies, same query, different logic geolocation you map regions to records by hand "EU users → eu-ip" use case: licensing, data residency geoproximity closest endpoint by distance + bias "min(distance) ± bias" use case: gradual traffic shifting latency lowest measured RTT to the resolver net "min(latency table)" use case: best network performance distance is not latency; that gap is why latency routing exists as a separate policy *Geolocation maps regions to records by hand, geoproximity ranks by distance with a bias knob, and latency routing ranks by historical round-trip time to the resolver's network.*

All three share a dependency that is easy to miss. To route by the user’s location, the authoritative server needs to know the user’s location. And the authoritative server does not talk to the user. It talks to the user’s recursive resolver, which can be a long way from the user.

The resolver problem and EDNS Client Subnet

Here is the structural flaw at the heart of GeoDNS. When your browser resolves a name, it does not query the authoritative server directly. It asks a recursive resolver, typically your ISP’s, or a public one like Google’s 8.8.8.8, Cloudflare’s 1.1.1.1, or Quad9’s 9.9.9.9. That resolver does the legwork and caches the result. So the source IP the authoritative server sees is the resolver’s, not yours. If you geolocate that IP, you are locating the resolver.

For an ISP resolver in the same city, this is fine. For a public resolver, it can be wildly wrong. A user in Tokyo using a resolver whose nearest node is in Singapore looks, to your authoritative server, like a user in Singapore. You send them to your Singapore endpoint when Tokyo was right there. Route 53’s own documentation states the fallback plainly: when the resolver does not support the relevant extension, “Route 53 uses the source IP address of the DNS resolver to approximate the location of the user.” That approximation is exactly as good as the assumption that the resolver sits near the user, which for public DNS is often false.

The fix is EDNS Client Subnet, ECS, standardised in May 2016 as RFC 7871, Client Subnet in DNS Queries. The mechanism lets the recursive resolver attach a truncated piece of the client’s IP address to the query it forwards to the authoritative server. The authoritative server geolocates that subnet instead of the resolver’s address. Two fields in the option carry the logic. SOURCE PREFIX-LENGTH is “an unsigned octet representing the leftmost number of significant bits of ADDRESS to be used for the lookup”, which is how many bits of the client address the resolver chose to reveal. SCOPE PREFIX-LENGTH is “an unsigned octet representing the leftmost number of significant bits of ADDRESS that the response covers”, which is how specific the authoritative server’s answer is, and therefore how narrowly the resolver must cache it.

Without ECS: the authoritative server sees only the resolver client Tokyo public resolver Singapore node authoritative sees: SG → answers with the Singapore endpoint. Wrong. With ECS: the resolver forwards a truncated client subnet client Tokyo public resolver + ECS 203.0.113.0/24 authoritative sees: JP/24 → answers with the Tokyo endpoint, scoped to /24. RFC 7871 recommends truncating to /24 for IPv4 and /56 for IPv6 before forwarding *Without ECS the authoritative server geolocates the resolver; with ECS it geolocates a truncated prefix of the client's own subnet.*

The truncation is deliberate, and it is a privacy compromise baked into the standard. RFC 7871 recommends resolvers truncate IPv4 addresses to 24 bits and IPv6 to 56 bits before forwarding, so the authoritative server learns the client’s network but not the exact host. The RFC is candid that this is a trade. “The network address of the client that initiated the resolution becomes visible to all servers involved in the resolution process”, the authors write, and they recommend the feature be “turned off by default in all nameserver software”, enabled only where it earns its keep. That last instruction tells you the working group knew they were shipping a surveillance vector dressed as a performance feature.

Resolver support is uneven, and that unevenness directly limits how well GeoDNS works. Google Public DNS supports ECS and enforces limits, rejecting IPv4 subnets more specific than /24 and IPv6 more specific than /56, and it auto-detects which authoritative servers handle ECS correctly rather than maintaining a manual allowlist, because “the number of addresses is smaller” than the number of zones. Cloudflare’s 1.1.1.1 is the loud holdout. It does not send client subnet to authoritative servers, a position Matthew Prince framed in May 2019 on privacy grounds, noting that the information “leaks information about a requester’s IP and, in turn, sacrifices the privacy of users” and that nation-state actors have used EDNS subnet data to track individuals. A 2025 survey of public resolver behaviour found the split holds: some resolvers forward ECS to authoritative servers (Quad9’s ECS endpoint, OpenDNS, Google selectively), some echo it back to clients (Google, AdGuard), and a meaningful slice forward nothing at all and return a scope of /0, opting their users out of geographic optimisation entirely. If a chunk of your users sit behind resolvers that strip ECS, your GeoDNS is geolocating a resolver for them no matter how carefully you configured the zone.

The SCOPE PREFIX-LENGTH field is also what stops ECS from blowing up resolver caches. A response scoped to /24 may only be served from cache to other clients in that same /24. A more specific scope means a smaller, more numerous set of cache entries. So ECS trades cache efficiency for geographic precision: the resolver now keeps a separate cached answer per client network instead of one answer for everyone, which is the right outcome for routing and an expensive one for caching.

Health checks: turning a record into a failover

Everything so far assumes your endpoints are up. The moment one is not, plain DNS keeps cheerfully handing out its address, because a zone file has no idea whether the host behind a record is answering. This is the failure mode RFC 1794 could not solve in 1995 and the reason managed DNS exists. During a server outage with an unchanged round-robin record, a predictable fraction of clients keep getting the dead address. With three records and one down, that is roughly a third of new resolutions pointed at a black hole until somebody intervenes.

Health checks close that gap by making the authoritative server’s answer conditional on liveness. The DNS provider runs probes against each endpoint, an HTTP or HTTPS request, a TCP connection, a string match in the response body, on a fixed interval from multiple vantage points. When an endpoint fails its checks, the provider stops returning that record. The steering logic, weighted or geo or latency, runs only over the set of endpoints currently passing their checks. Route 53 prices these per check per month and charges more for the elaborate ones, HTTPS, string matching, checks driven by a CloudWatch alarm, which tells you how much machinery sits behind a green light.

This is where DNS-based steering and proxy-based steering diverge in an important way. A reverse proxy or anycast load balancer sees every request and can fail a request over in real time, mid-connection. DNS sees only the resolution, which might have happened minutes ago and been cached. So even with health checks, DNS failover is bounded below by how fast the bad answer can leave every cache between you and the user. A health check might detect a dead endpoint in 30 seconds. If that endpoint’s address is sitting in resolver caches with a 300-second TTL, the slowest clients keep hitting it for the better part of those five minutes. The detection is fast. The propagation is not.

Health checks also feed back into the location-aware policies, not just failover. Cloudflare’s load balancing builds the same way: traffic decisions start with which pools and endpoints are healthy, then a steering policy (standard failover, geo, dynamic by measured pool performance, proximity, or least outstanding requests) picks among the survivors, and that whole machine runs at the DNS layer for DNS-only load balancers. The ordering is the point. Liveness filtering happens first, steering second. A geo policy never sends a German user to the dead Frankfurt pool, because the dead pool is gone from the candidate set before the geo logic runs. The same is true in Route 53: a latency record pointed at an unhealthy region drops out, and latency routing then picks the lowest-latency record among the rest. Without that ordering, geo and latency steering would confidently route users to precisely the endpoint that just failed, which is the worst possible answer.

The 2024 browser experiments make this concrete and add a twist. When a server was stopped, most clients (browsers, Safari, curl) noticed and switched to a healthy address on their own, because Happy Eyeballs racing gives them a second target to try. But a front end doing IP-based deterministic assignment kept routing to the dead server and returned errors, never re-racing. So client-side behaviour can either rescue you or sabotage you, independent of your health checks, and you do not control which.

TTL and caching: the governor on everything

Every steering decision you make in DNS is subject to one number you set and then lose control of: the TTL, the time-to-live the authoritative server attaches to each record. It tells every resolver and client along the path how long they may cache the answer before asking again. The TTL is the dial that sets how responsive your steering is, and it trades directly against load and stability.

Set a low TTL and your answers turn over quickly. A 30-second TTL means a failover or a weight change reaches users within roughly half a minute, and your steering is nimble. It also means resolvers re-query you constantly, which raises query volume and cost, and removes the cache that normally shields you from a DNS outage. Set a high TTL and you get the opposite: cheap, resilient, cacheable, and sluggish. A drained endpoint at a 3600-second TTL can keep receiving cached traffic for an hour after you cut its weight to zero. There is no setting that is fast and cheap and forgiving at once. You pick two.

The TTL dial TTL 30s TTL 3600s fast failover high query load no cache cushion sluggish failover low query load cached, resilient stale answers may keep flowing *Lower TTL buys responsive steering at the cost of query volume and cache protection; higher TTL inverts the trade.*

And then there is the part that ruins the clean theory: the TTL is advice, not law. Plenty of clients and resolvers ignore it. The JVM is the classic offender, historically caching DNS results for the lifetime of the process regardless of TTL, so a long-running Java service can pin itself to one address and never notice your failover. Browsers keep their own DNS cache with their own minimums, and corporate resolvers sometimes clamp very low TTLs upward to protect themselves from query storms. You can set a 30-second TTL and still have a meaningful population of clients holding the old answer for far longer than you asked. There is no acknowledgement on a DNS answer. You hand it out and hope.

This is the structural reason DNS load balancing is approximate and always will be. The unit you actually control is not the user and not even the query. It is the cache entry, shared across everyone behind a given resolver for the duration of a TTL that some of them will honour and some will not. Round-robin distributes across cache entries, not users, so a single big resolver can send a whole population to one address for a full TTL. Weighting hits its ratio only in aggregate over many cache entries. Geo and latency steering are only as accurate as the location you can attribute to a resolver, plus whatever ECS adds for the subset of resolvers that send it. Health checks fail over only as fast as the slowest cache will release the dead answer. Every one of these limits traces back to the same root: DNS was designed to be cached, and a thing designed to be cached makes a poor real-time control plane.

What DNS steering is good for, and what it is not

None of this makes DNS steering useless. It makes it a particular tool with a particular shape. It is the right layer for coarse, slow-moving decisions where a 30-to-300-second granularity is fine and the win is enormous: getting a user roughly near a continent’s worth of infrastructure before the first TCP packet, draining a region over minutes, splitting traffic across providers, surviving the loss of a whole data centre. For that work it is unbeatable, because it sits ahead of everything and costs almost nothing per query. The closest endpoint chosen by DNS is usually an anycast address anyway, so DNS picks the region and anycast picks the node within it, two layers of steering doing what each does well.

What DNS cannot be is a real-time load balancer. It cannot bleed exactly 50.0 percent of requests to a canary, because it distributes cache entries and not requests. It cannot fail a single request over mid-flight, because it never sees the request. It cannot react in seconds across the whole population, because the slowest cache governs the tail. Teams that try to use it for fine-grained, request-level control end up fighting the cache, and the cache wins. The ones who get good results treat DNS as the first and coarsest stage of steering and put the precise work, the per-request decisions, the health-aware proxying, the instant failover, behind it at a layer that actually sees the traffic.

There is a tidy way to see the whole tension in one fact. The mechanism dates to 1995, and the standard that was supposed to fix its biggest flaw, ECS, arrived in 2016 and shipped turned off by default for privacy reasons, with one of the three largest public resolvers refusing to send it at all. So thirty years in, the most basic question a geo-steering system has to answer, where is this user, still cannot be answered reliably for a large share of the internet’s clients. You steer with the location you can get, at the granularity the cache allows, and you build the precise machinery somewhere the traffic actually flows through your hands.


Sources & further reading

Further reading