Consistent hashing: the algorithm behind every modern load balancer and cache
You run a cache cluster with ten servers. A key hashes to hash(key) mod 10, lands on server 7, and life is good. Then traffic grows and you add an eleventh server. Now the modulus is 11, and hash(key) mod 11 sends almost every key to a different machine than before. The cache that took hours to warm is cold across the whole fleet. Not one server’s worth of keys moved. Nearly all of them did, because changing the divisor reshuffles the entire output space. That single failure mode is the reason consistent hashing exists.
The question consistent hashing answers is narrow and precise. When the set of servers changes by one, how few keys can you get away with moving? The naive modulo answer is “almost all of them.” The right answer turns out to be “about K/n of them,” where K is the number of keys and n is the number of servers, and getting from the first answer to the second is the whole story. This post walks the algorithm from the original 1997 construction, through the ring and virtual nodes that every production system actually ships, into the variants that fixed the parts the ring got wrong: jump hash, Maglev’s lookup table, and the bounded-load scheme that Vimeo pushed into HAProxy. The math is simple. The engineering around it is where the interesting decisions live.
The sections below start with why modulo fails, then build the ring and prove the minimal-remapping property, then add virtual nodes to fix load skew, then turn to the three big production variants and what each one trades away. We close on where consistent hashing sits in a real load balancer in 2026 and what it does not solve.
Why modulo hashing breaks
A hash function maps a key to a number. To pick a server you reduce that number into the range of available servers, and mod n is the obvious reducer. It distributes keys evenly when n is fixed. The problem is entirely about what happens when n changes.
Consider the arithmetic directly. A key whose hash is h goes to server h mod n. Change n to n+1 and the key now goes to h mod (n+1). These two values agree only by coincidence. Vimeo’s engineering writeup put a concrete number on it: with traditional modulo hashing, adding one server to a pool of nine means “only one-tenth of requests will (by luck) hash to the same server as they did before.” The other nine-tenths move. For a cache, a moved key is a cache miss followed by a fetch from the origin and a re-population of the new server. Do that to ninety percent of a working set at once and the origin sees a thundering herd while every cache in the fleet is busy evicting and refilling. For a sharded database the consequence is worse: moving a key means moving the data that lives under it, so a resize becomes a near-total reshuffle of bytes across the network.
The deeper issue is that mod n ties the key-to-server mapping to the exact count of servers. Every key’s destination is a function of n, so any change to n perturbs every key. What you want instead is a mapping where a key’s destination depends on the servers themselves, not on how many of them there happen to be, so that adding or removing one server touches only the keys near that server. That is the property the ring delivers.
The ring
The construction that Karger and his coauthors published at STOC 1997, in a paper titled “Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web,” is the one still drawn on whiteboards today. The paper introduced the term “consistent hashing” and was written by a group at MIT that included Daniel Lewin, who a year later co-founded Akamai with Tom Leighton and built a CDN on top of the idea.
Take the output range of your hash function and treat it as a circle. If the function produces 32-bit integers, the circle runs from 0 to 2³² − 1 and then wraps back to 0. Hash each server onto the circle using its name or IP, so each server becomes a point on the ring. Hash each key onto the same circle with the same function. To find the server responsible for a key, start at the key’s position and walk clockwise until you hit the first server point. That server owns the key. The formal treatment in Stanford’s CS168 notes maps everything onto the unit interval [0, 1) instead of a 32-bit range, which is the same circle scaled down, and the rule is identical: a key is owned by the nearest server in the clockwise direction.
*The ring: servers and keys hash onto the same circle, and each key is owned by the first server clockwise. k1 belongs to S2; k2 belongs to S3.*The implementation is a sorted structure of server positions. In practice it is a sorted array or a balanced tree (or in many real codebases a TreeMap-style structure with a ceilingEntry lookup), and finding the owner of a key is a binary search for the smallest server position greater than or equal to the key’s position, wrapping to the first server if the key sits past the last one. Lookup is O(log n) in the number of server points. The structure changes only when servers join or leave, which is rare relative to lookups.
The minimal-remapping property
Here is why the ring earns its name. Suppose a server fails and you delete its point from the ring. Which keys are affected? Only the keys that the dead server owned, which are the keys sitting in the arc between the previous server (counterclockwise) and the dead one. Those keys now walk a little further clockwise and land on the next server. Every other key on the ring is untouched, because its clockwise walk never passed through the deleted point. Adding a server is the mirror image: the new point splits one existing arc, and only the keys in the slice it carves off move, from the old owner of that arc to the new server.
Quantify it. If servers are spread roughly evenly around the ring, each one owns about 1/n of the circle. Removing one server moves the keys in its arc, which is about 1/n of all keys, to a single neighbor. Adding one server claims about 1/n of the circle from the servers it lands among. So a single membership change relocates on the order of K/n keys, not K. Wikipedia states the property in its general form: when the number of slots changes, “only n/m keys need to be remapped on average,” and the K/n version is the same statement with K keys over n servers. That is the entire payoff. The cost of a topology change scales with one server’s share of the data, not with the whole dataset.
*Modulo reshuffles the whole space on any change in n. The ring moves only one server's share.*This is also the property that makes consistent hashing the natural backbone for a CDN and for cache-aware load balancing. If you want a layer-7 proxy to send the same request to the same backend so the backend’s local cache stays hot, you hash the request key onto a ring of backends. When a backend goes down, only its slice of requests reroutes; everything else keeps hitting its warm cache. That is the connection to broader load balancing algorithms and to how a CDN actually works, where keeping a request pinned to a cache-warm origin is the difference between a hit and an expensive miss. It also pairs directly with HTTP caching headers: the ring decides which cache node sees the request, and the headers decide whether that node can answer without revalidating.
Virtual nodes
The ring as described so far has a flaw that shows up immediately in production. With a handful of servers hashed to random points, the arcs between them are not equal. Random placement of n points on a circle produces arcs whose sizes vary considerably, so one server might own twice the share of another purely by where its hash landed. Amazon’s Dynamo paper, presented at SOSP 2007, named this directly: random position assignment “leads to non-uniform data and load distribution.” A second problem compounds it. When a server fails, its entire arc dumps onto a single clockwise neighbor, so the failure does not spread the load across the survivors, it concentrates it on one of them.
The fix is virtual nodes, and Dynamo is where most engineers first met the term. Instead of hashing each physical server to one point, hash it to many. A server named cache-7 gets points for cache-7#0, cache-7#1, all the way up to cache-7#199, scattering two hundred small arcs around the ring instead of one large one. The physical server owns the union of all its virtual points’ arcs. The law of large numbers does the rest: with enough virtual points per server, every physical server’s total share converges to 1/n with small variance, because it is now the sum of many independent small arcs rather than one arc of random size.
Dynamo lists the practical wins, and they go beyond balance. When a node fails, the load that node carried “is evenly dispersed across the remaining available nodes” rather than dumped on one neighbor, because the failed node’s many virtual points sit next to many different survivors. When a node joins, it accepts a roughly equal slice from many existing nodes at once. And because you control how many virtual points a server gets, you can give a beefier machine more of them, so virtual-node count becomes a capacity knob: a server with twice the RAM gets twice the points and owns twice the keys. Dynamo’s own variant tied each physical node to a fixed number of virtual nodes tuned to its capacity.
The cost is memory and lookup time. A thousand servers at two hundred virtual points each is two hundred thousand points on the ring, which is two hundred thousand entries in your sorted structure and a binary search over that many elements per lookup. That is usually fine. But it is exactly the cost that the next two variants were designed to avoid.
Jump consistent hash
In 2014 two Google engineers, John Lamping and Eric Veach, published “A Fast, Minimal Memory, Consistent Hash Algorithm,” and the result is almost startling in how little it needs. No ring. No stored points. No virtual nodes. The function takes a 64-bit key and a bucket count, and returns a bucket number, using only arithmetic and a tiny loop. The whole thing is about five lines.
The core, from the reference Go port, is this loop:
b, j = -1, 0while j < num_buckets: b = j key = key * 2862933555777941757 + 1 j = (b + 1) * (2^31 / ((key >> 33) + 1))That magic constant, 2862933555777941757, is the multiplier of a linear congruential generator. The trick rests on a probability argument. Imagine you already have the key’s bucket for n buckets and you increase to n+1. The key should move to the new bucket with probability exactly 1/(n+1), and otherwise stay. The loop simulates exactly that: it uses the LCG to generate a deterministic pseudo-random sequence seeded by the key, and at each step decides whether the key would “jump” to a higher bucket as the bucket count grows. Because the jump probability shrinks as 1/(n+1), the expected number of jumps is the harmonic series, which grows like the natural log of the bucket count. So the algorithm runs in O(ln n) time and O(1) memory, and the paper shows it splits keys more evenly than the classic ring while needing no storage at all.
The catch is in the name “bucket.” Jump hash returns an integer in [0, num_buckets), and those buckets must be numbered sequentially with no gaps. You can grow or shrink the count at the high end cleanly. What you cannot do is remove bucket number 3 from the middle of a thousand buckets and keep the rest stable, because the function has no notion of a named server that can disappear, only a count. That makes jump hash an excellent fit for sharding data across a pool you scale up and down as a whole (its sweet spot is storage backends), and a poor fit for a load balancer whose individual machines fail and recover by name. The paper itself flags this: the sequential-numbering requirement makes it “more suitable for data storage applications than for distributed web caching.”
Maglev’s lookup table
Google’s network load balancer, Maglev, presented at NSDI 2016, took a different route to the same goal. Maglev sits in front of services like Search and Gmail and has to pick a backend for every packet at line rate. A binary search over a ring per packet is too slow, and a giant ring costs too much memory. So Maglev precomputes a flat lookup table and indexes it with a hash of the packet. One array read, no search.
The construction builds that table to be both balanced and minimally disruptive. Pick a table size M, a prime number, larger than the backend count (the paper uses sizes like 65537 and 655373). Each backend generates a permutation of the table positions from two hashes of its name: an offset and a skip, with permutation[j] = (offset + j * skip) mod M. This gives every backend its own preference order over all M slots. Then the population step loops over the backends round-robin, and each backend in turn claims the next slot in its preference list that is still empty, until every slot is filled. Because the round-robin gives each backend equal turns, every backend ends up owning very close to M/(number of backends) slots, which is even load by construction.
Removing a backend rebuilds the table, and here the prime size M is doing the real work. When one backend leaves, the slots it owned get redistributed, and the paper-trail analysis of Maglev points out the obvious lower bound: at least 1/M of connections that map to the freed slots get reset. By making M large, each slot covers a tiny fraction of the hash space, so the disruption from any single backend change stays small. Maglev’s design also leans on a separate connection-tracking table so that existing flows keep going to their established backend even when the lookup table shifts underneath them, which is why a table rebuild does not blow away live connections wholesale. The lookup table gives near-perfect balance and O(1) lookups. What it gives up is the ring’s pure minimal-remapping guarantee: a rebuild is “minimally disruptive” rather than provably optimal, a trade Maglev accepts because a small number of resets is tolerable and the connection tracker covers the common case.
Consistent hashing with bounded loads
The ring with virtual nodes balances load only in expectation, and only when keys are themselves uniformly popular. Real traffic is not uniform. One video, one product page, one hot key can draw far more requests than its neighbors, and the server that owns that key drowns while the rest idle. Vimeo hit exactly this with their video cache. Their writeup states the limitation bluntly: plain consistent hashing “only balances loads about as well as choosing a random server for each request,” and when some content is far more popular than the rest, all the requests for that content pile onto the same small set of servers.
The fix came from a 2016 paper, “Consistent Hashing with Bounded Loads,” by Vahab Mirrokni, Mikkel Thorup, and Morteza Zadimoghaddam, posted to arXiv in August of that year. The idea adds a hard ceiling. Pick a balancing factor, call it c, slightly above 1. Compute the average load across all servers, and cap each server’s capacity at the ceiling of c × average. Now assign keys by the usual ring walk, but with one change: when a key’s clockwise walk reaches a server that is already at capacity, skip it and keep walking to the next server with room. No server is ever allowed past c times the average, so a hot key that would have crushed one server instead overflows onto the next, and the next, spreading the heat.
The guarantee the paper proves is tight. No server exceeds (1 + ε) times the average load, where ε relates to the balancing factor, and the number of keys that must move on any single insertion or removal is O(1/ε²), a bound that does not depend on the total system size. You buy load smoothness with a parameter, and the relocation cost stays bounded no matter how big the cluster grows. Google implemented it first in Cloud Pub/Sub and reported a substantial improvement in load uniformity.
The real-world proof came from outside Google. Andrew Rodland at Vimeo found the paper three months after it went up, implemented it in HAProxy, and added an option called hash-balance-factor that lets a request weigh a server’s current load alongside whether that server already caches the wanted chunk. Vimeo recommends values for the factor between 1.25 and 2. The patch shipped in HAProxy 1.7, which became stable in November 2016, so bounded-load consistent hashing has been a one-line configuration in a mainstream open-source load balancer for nearly a decade now. The measured effect at Vimeo: memcached servers that previously spiked to 400 to 500 Mbit/s outbound during peak, around 8 Gbit/s across the fleet, dropped to comfortably under 100 Mbit/s each after the change, because a far larger fraction of requests now hit a local cache instead of overloading a hot subset.
This is the variant most directly relevant to anyone running a reverse proxy or transparent proxy tier in front of a cache. The plain ring keeps a request pinned to its cache-warm backend; the bounded-load ring keeps that pinning while refusing to let any single backend melt under a popularity spike. For a crawler or any client fleet that distributes its own work across a proxy pool, the same shape applies in reverse: hash work onto the pool so the same target keeps hitting the same egress, but cap any one egress so a hot target does not concentrate all its traffic on a single IP.
Rendezvous hashing, the road not taken
Consistent hashing is not the only solution to the minimal-remapping problem, and it was not even the first. Rendezvous hashing, also called highest random weight or HRW, was published by David Thaler and Chinya Ravishankar at the University of Michigan in 1996, a year ahead of the Karger paper. It throws out the ring entirely. To place a key, compute hash(key, server) for every server, and assign the key to the server with the highest score. To move a key when a server leaves, you simply drop that server from the comparison, and the key falls to whichever remaining server had the second-highest score, so only the keys that were on the departed server move. No tokens to store, no virtual points to scatter, and load balances cleanly because each key’s winner is independent.
The textbook objection is speed: naive HRW is O(n) per lookup because you score every server, where the ring is O(log n). For small pools that does not matter, and HRW gives better balance with less machinery, which is why it shows up in places where the backend count is modest. Consistent hashing can in fact be shown to be a special case of HRW with the right two-place hash function. The ring won the popularity contest mostly because Dynamo and the early CDNs standardized on it, but in 2026 plenty of systems reach for rendezvous hashing when the server count is small enough that the linear scan is free and the simpler, more uniform placement is worth having.
Where it sits in 2026
Strip away the variants and the core idea has not changed since 1997. You want a function from keys to servers where removing one server moves only that server’s keys and nothing else. The ring delivers it with a clockwise walk; virtual nodes make the shares even; jump hash drops the storage when servers are just numbered slots; Maglev flattens it into a table for line-rate lookups; bounded load adds a ceiling so popularity spikes spread instead of concentrate. Each variant keeps the minimal-remapping property and trades one other thing, memory or lookup speed or perfect optimality, for a property the plain ring lacked.
What consistent hashing does not give you is worth stating plainly, because the failure modes are where teams get burned. It balances load only as well as keys are uniform, which is why bounded load exists. It says nothing about replication, consistency, or what happens to in-flight data when a key’s owner changes, which is why Dynamo wrapped the ring in quorums and vector clocks and preference lists, and why Maglev needs a separate connection tracker to keep live flows stable. The ring decides where a key goes. Everything hard about a distributed system, keeping the data correct once it gets there, lives in the layers above it.
The clearest sign of how settled the algorithm is: the load balancer in front of whatever you are reading this on almost certainly uses one of these five constructions, the choice between them was made years ago by someone who measured cache hit rates, and the 1997 paper that started it is still the citation. Daniel Lewin, one of its authors, was on American Airlines Flight 11 on September 11, 2001. The CDN he helped build on the ring outlived him and now carries a large share of the web’s traffic, still walking clockwise to the first server it meets.
Sources & further reading
- Karger, Lehman, Leighton, Panigrahy, Levine, Lewin (1997), Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web — the STOC 1997 paper that introduced the ring and coined the term.
- DeCandia et al., Amazon (2007), Dynamo: Amazon’s Highly Available Key-value Store — the SOSP paper that popularized virtual nodes and named the load-skew problems they fix.
- Lamping, Veach, Google (2014), A Fast, Minimal Memory, Consistent Hash Algorithm — the jump consistent hash, five lines, O(ln n) time, no storage.
- Mirrokni, Thorup, Zadimoghaddam (2016), Consistent Hashing with Bounded Loads — the capacity-ceiling variant with the O(1/ε²) movement bound.
- Google Research (2017), Consistent Hashing with Bounded Loads — the blog version, with the Cloud Pub/Sub and Vimeo/HAProxy adoption story.
- Rodland, Vimeo (2016), Improving load balancing with a new consistent-hashing algorithm — the HAProxy
hash-balance-factorimplementation and the ~8x cache bandwidth result. - Eisenbud et al., Google (2016), Maglev: A Fast and Reliable Software Network Load Balancer — the NSDI paper describing the lookup-table consistent hashing scheme.
- The Paper Trail (2020), Network Load Balancing with Maglev — a readable walkthrough of Maglev’s permutation and table-population algorithm.
- Stanford CS168 (2023), Introduction and Consistent Hashing — lecture notes with the unit-interval formulation and the remapping proof.
- Thaler, Ravishankar (1996), Rendezvous hashing — the highest-random-weight alternative that predates the ring.
- Wikipedia, Consistent hashing — the n/m remapping statement, the Akamai/Lewin history, and the list of production systems using it.
Further reading
Designing a fair queue at scale: lessons from high-demand ticket on-sales
Traces the distributed-systems problem behind a virtual waiting room: admission control under a thundering herd, the fairness-versus-throughput tradeoff, clock skew in queue ordering, the signed token design, and the failure modes that leak slots.
·24 min readDesigning a distributed crawler: frontier, dedup, politeness, and backpressure
Traces the architecture of a web-scale crawler from Mercator and the early Googlebot through IRLbot to today: the URL frontier, duplicate elimination, politeness scheduling, and how servers push back.
·21 min readURL frontier design: from Mercator to modern priority-queue crawlers
How the URL frontier orders a crawl: the Mercator front-queue/back-queue split, per-host politeness, freshness versus coverage, and the disk-backed and gRPC designs that run at web scale today.
·22 min read