How virtual waiting rooms work: token buckets, queue position, and fair ordering
You click into a ticket sale at the exact second it opens, and instead of the event page you get a holding screen. A spinner. A number that says you are 14,322nd in line. An estimate that says about nine minutes. The page promises it will move you forward automatically, and tells you not to refresh. So you sit there, watching a counter, doing nothing, while somewhere upstream a system decides when it is your turn.
That holding screen is a virtual waiting room, and the interesting part is not the screen. It is the small amount of state the system is keeping about you, the arithmetic it runs to decide who moves and when, and the cookie it hands you so it can recognise you the next time your browser checks back. None of that is visible. All of it is simple once you see the shape. This post is about the shape.
What follows works from the general concept outward. First, why waiting rooms exist at all and why they are a different tool from a rate limiter or a bot wall. Then the admission model, which almost always reduces to a bucket and a counter, and why people argue about whether it is a token bucket or a leaky bucket when the answer is “both, they are the same thing seen from two sides.” Then the two ordering disciplines, FIFO and random, and the surprising fact that the random one exists mainly because of bots. Then the token or cookie that holds your place, with real field names from the systems that publish them. And finally the split that the whole design turns on: the difference between everyone who is waiting and the few who are active, which is the only number the protected site actually cares about. A Queue-it-specific companion to this piece lives at /blog/queue-it-architecture; here the goal is the model that all of them share.
Why a waiting room is not a rate limiter
A rate limiter answers a question about a single client: has this IP, or this token, or this user, made too many requests in the last window? If yes, it rejects the request, usually with a 429 and a Retry-After header. The client is expected to back off and try again later. There is no line, no position, no promise. The system does not remember that you were turned away, and it owes you nothing on your next attempt. Two clients hitting the same limiter at the same instant are independent events.
A waiting room answers a different question, and it answers it about the aggregate rather than the individual. The question is: how many people are currently allowed to be using the protected site, and are we under that ceiling? If the site can safely serve, say, 5,000 concurrent shoppers through a checkout flow, then the waiting room’s entire job is to make sure no more than roughly 5,000 are inside at once, and to hold everyone else in an orderly line until a slot frees up. The individual request is not rejected. It is parked. The system remembers you, gives you a place, and commits to letting you in eventually in some defined order.
That difference, between rejecting and parking, is the whole reason waiting rooms exist. High-demand on-sales (concert tickets, limited sneakers, a console restock, a government appointment portal the morning a new quota opens) produce a load shape that no amount of horizontal scaling fixes cheaply. Ten or a hundred times normal traffic arrives in the first sixty seconds and then decays. You could provision for the peak, and pay for idle capacity the other 364 days. You could let the origin fall over, and serve nobody. Or you could admit the load you can handle and queue the rest, which is what a waiting room does. DataDome’s own explainer puts the distinction plainly: a waiting room is a traffic-management layer, separate from bot mitigation, that smooths a spike into something the origin can absorb.
The confusion (and the reason this topic lives next to anti-bot writing at all) is that the same on-sales attract automated buyers, and the waiting room sits exactly where you would want a bot control. So waiting rooms grew bot-resistance features, and bot vendors grew waiting-room features, and the two categories blurred. But the core of a waiting room is a capacity-management device. The bot resistance is bolted to the side, and as we will see, the place it bolts on cleanest is the ordering discipline.
The admission model: one bucket, one counter
Strip a waiting room down to its load-bearing wall and you find an admission controller. It holds a number that represents how much room is left inside the protected zone, it lets requests in while that number is positive, and it refills the number at a controlled rate. That is a bucket. Everything else is presentation.
The cleanest published description of this comes from the engineering write-ups around SeatGeek’s waiting room and the various “build your own” guides: a leaky bucket holds the access tokens of users currently inside the protected zone, the bucket’s size is set equal to the protected zone’s capacity, and a new visitor is admitted only if the bucket is not full. When the bucket is full, the visitor is routed to the waiting room instead. As users finish and leave, their tokens drain out and the bucket gains room, which pulls the next people out of the line. The capacity of the bucket is the concurrency ceiling. The drain rate is how fast people leave. The fill is whoever is next in the queue.
*The admission controller in one picture: a line on the left, a fixed-capacity bucket in the middle, a drain on the right. The waiting room only admits when the bucket has room.*Now the argument about which algorithm this is. People reach for “token bucket” and people reach for “leaky bucket” and both are right, because the two are the same thing viewed from opposite sides. The leaky bucket, in the form Jonathan Turner described in a 1986 IEEE Communications article, is a counter: it fills as work arrives and drains at a fixed rate, and an arrival conforms only if the counter has room. The token bucket is its mirror. A bucket fills with tokens at a steady rate up to some maximum depth, and to do a unit of work you must remove a token; if the tokens are gone, you wait. The Loyola networking text states the equivalence directly, and Wikipedia’s treatment of the leaky bucket spells out that the meter form is “exactly equivalent to (a mirror image of)” the token bucket. Identical parameters, identical decisions. So when a waiting-room vendor says “leaky bucket” and an engineer reading the code sees tokens being handed out, nobody is wrong.
The waiting room actually uses the bucket in a slightly unusual way, worth being precise about. In classic network rate limiting the bucket meters a stream of packets and the question is “is this packet conformant.” In a waiting room the bucket’s occupancy is the set of people currently inside, and the question is “is there room for one more person.” The token, in other words, is not a permission to send one request. It is a lease on a seat, held for the duration of a session and released when the session ends or times out. That is why every real waiting room pairs the bucket with a session lifetime: a seat you never give back is a seat lost to the next person in line, so admitted sessions carry an expiry, and the bucket reclaims their room when the lease runs out.
There is a second admission style worth naming, because the strict bucket has a sharp edge. A hard threshold (admit while under capacity, queue the instant you hit it) produces a binary cliff, and that cliff can oscillate under bursty arrivals. Some designs soften it with a probabilistic admission curve, where the chance of being let straight through falls smoothly as occupancy climbs toward the ceiling rather than snapping from 100 percent to zero. The shape is a sigmoid rather than a step. The effect is gentler behaviour right at the boundary, where a strict bucket would flap between “open” and “queueing” several times a second. Most production waiting rooms run the strict version because it is easier to reason about and explain to an operator watching a dashboard, but the soft variant is a known refinement, not a hypothetical.
FIFO, and why fairness is harder than it looks
Once you are holding a line, you have to decide who goes first. The obvious answer is first-in, first-out: order people by when they arrived and admit them in that order. FIFO is what a physical queue does, it is what people expect, and it is the default for the unplanned case, the traffic spike that nobody scheduled. Cloudflare’s Waiting Room documents fifo as one of its queueing methods and describes it exactly this way: visitors are ordered by when they entered the waiting room, and the earliest arrivals get priority. Queue-it uses FIFO when its waiting room fires as a safety net in response to an unexpected surge.
FIFO has one property that is a feature in calm conditions and a liability during a hyped on-sale. It rewards arriving early. For a traffic spike that is fine, even desirable: the people who were already on the site when the surge hit should not lose their place to latecomers. But for a scheduled drop, “rewards arriving early” turns into an arms race measured in milliseconds. The prize goes to whoever can fire a request closest to the on-sale instant, and the entity best equipped to fire a request at a precise millisecond is not a human with a mouse. It is a script with a synchronised clock and a warm connection pool. Pure FIFO on a scheduled high-demand sale is close to handing the front of the line to the fastest bot.
There is also a subtler FIFO problem that has nothing to do with bots: accurate wait estimates can be self-defeating. Cloudflare’s engineers noted that a strictly ordered queue produces a wait time that grows linearly the later you arrived, and showing someone an honest “your wait is 47 minutes” is an efficient way to make them abandon. The honest number drives away exactly the patient, legitimate visitors the queue is meant to protect. That observation pushed them toward the second discipline.
Random ordering, and the raffle
The alternative to ordering by arrival time is to not order by arrival time at all. In a random queue, when a slot opens, the system picks who gets it from among everyone currently waiting, regardless of when they showed up. Cloudflare added this in October 2021 and explained it with a raffle analogy: tickets go into a container, and when a prize is available a ticket is drawn at random. The longer you have been in the raffle, the more draws you have been part of, so earlier arrival still improves your odds, but it does not guarantee you anything and it does not put you ahead of someone who arrived a minute later. Everyone in the room has a live chance on every draw. Queue-it’s scheduled-sale mode does the same thing by a different route: everyone who arrives during the pre-queue, before the countdown hits zero, is assigned a random position the moment the sale starts, and only then does the line settle into FIFO order for the rest of the event.
*FIFO admits the head of the line; random draws from the whole waiting pool. Random is the discipline that blunts the millisecond-precise bot.*The reason random ordering exists is bot resistance, even though it is presented as a fairness feature. If position is decided by a draw rather than by arrival time, then arriving a few milliseconds before everyone else buys you nothing. The script that fires at the on-sale instant with perfect timing lands in the same lottery as the human who loaded the page two seconds late. Cloudflare’s own description says random queueing prevents fast clients from gaining an unfair advantage, and Queue-it makes the same claim about its randomised pre-queue: it stops speedy bots from jumping the line. The fairness story and the anti-bot story are the same mechanism. Take away the value of being first and you take away the thing a bot is uniquely good at.
It is worth being honest about the limit of this. Random ordering removes the advantage of timing. It does nothing about the advantage of volume. A bot operator who enters the raffle a thousand times from a thousand IP addresses has a thousand tickets against your one, and the lottery happily gives them a thousand chances. That is why random ordering is paired with the rest of the anti-automation stack, the proxy detection and device checks and challenge layers covered across the rest of this blog. Ordering discipline solves the timing race. It is not a substitute for figuring out that those thousand entries are one actor. The vendors that run waiting rooms in front of the highest-demand sales (the dynamics behind designing a fair queue at scale) treat ordering and identity as two separate problems that have to be solved together.
There are two more queueing methods worth naming for completeness, both from Cloudflare’s vocabulary. passthrough sends everyone straight through without queueing, which is what the room does when traffic is below threshold or when an operator wants the plumbing in place without enforcing a line. reject does the opposite, refusing all traffic with a static page, useful for a hard maintenance window or an endpoint that should only be live during an event. Most of the time a room sits in passthrough, invisible, and only flips to fifo or random when load crosses the line the operator set.
The cookie that holds your place
A waiting room is stateless from the browser’s point of view in the sense that there is no persistent connection. Your browser is not held open on a socket for nine minutes. It checks in, gets told “not yet,” waits a bit, and checks in again. For that to work, the system has to recognise you across check-ins, which means it has to hand you something to send back. That something is a cookie or a token, and its job is to carry your identity and your place in line so that each poll picks up where the last one left off.
Cloudflare’s implementation is the most openly documented, so it is the best one to be concrete about. On your first request into a covered host and path, you receive a cookie named __cfwaitingroom. The cookie is encrypted so you cannot read or forge its contents, and the documented fields inside include a bucketId (a timestamp rounded to the minute you arrived, which is how FIFO knows your arrival order), an acceptedAt value (set when you are admitted to the application), a refreshIntervalSeconds telling your browser how long to wait before the next check-in, and a lastCheckInTime. While you are still waiting, the cookie’s expiry is held at five minutes but refreshed every twenty seconds, so as long as your tab is open and polling, your place is kept warm; close the tab and the cookie lapses and your spot is gone. Once you are admitted, the cookie’s lifetime switches over to the configured session_duration, which is the lease on your seat inside.
Two things make this cookie trustworthy as a place-holder. It is encrypted, so you cannot edit your bucketId to claim you arrived earlier than you did, and you cannot mint one yourself because you do not hold the key. Cloudflare’s implementation is a JWT-style signed-and-encrypted token verified at the edge, so a forged cookie fails verification at the nearest data centre before it ever reaches your origin. Other vendors carry the same information in their own formats. Queue-it issues a signed token, validated with an HMAC-SHA256 signature against the customer’s secret key, that the connector checks on the request path. The shape of the claim is the same everywhere even when the cryptography differs: who you are, when you arrived, whether you have been admitted, and a signature that makes all of that unforgeable.
A note on certainty, because it matters in a reference. The field names above are the ones the vendors publish. The exact serialized layout, the encryption mode, the order of the bytes inside the encrypted blob, none of that is documented, and you should not trust any blog (including reverse-engineering write-ups) that claims to know the internal structure with precision. What is documented is the contract: the cookie name, the named fields, the expiry behaviour, the renewal interval. That contract is enough to understand how the place-holding works without guessing at the ciphertext. Where a detail is inferred from observed traffic rather than stated by the vendor, the honest move is to say so.
Polling, and the estimate on the screen
While you wait, your browser is not idle in any interesting sense. It is polling. On Cloudflare’s room the waiting page refreshes itself roughly every twenty seconds, carrying the cookie back so the system can re-evaluate your position and hand you an updated screen. AWS’s reference implementation does the same with an explicit set of endpoints: a client calls assign_queue_num once to get a request ID, then polls queue_num for its position and serving_num for the current serving counter, and when its position is at or below the serving counter it calls generate_token to get its JWT. The poll interval is the heartbeat of the whole system, and it is deliberately jittered. Cloudflare varies the refreshIntervalSeconds by a pseudo-random offset per check-in, specifically so that thousands of clients that all entered in the same minute do not all wake up and hammer the system on the same synchronised tick.
The number on your screen, the estimated wait, comes out of arithmetic simple enough to write on a napkin. Cloudflare documents it as visitorsAhead ÷ activeUsersToWebApplication: the count of people ahead of you in line, divided by the rate at which people are currently leaving the application and freeing slots. If 14,000 people are ahead and the site is draining 1,500 a minute, that is a touch over nine minutes. The estimate moves as both numbers move, which is why it jumps around. The honesty of that number is exactly the FIFO problem from earlier: a truthful linear estimate during a long wait is an efficient way to lose the visitor, which is part of why random ordering, where a precise position does not even exist, became attractive for hyped events. You cannot show someone their exact place in a raffle, and that turns out to be a feature.
The split the whole thing turns on: inbound versus active
The single number a protected site cares about is how many people are active inside it right now. Not how many are waiting. Not how many arrived in the last minute. How many are currently consuming origin capacity. Everything the waiting room does is in service of keeping that one number under a ceiling. So the cleanest way to understand any of these systems is to see them as maintaining a hard boundary between two populations: the inbound, everyone who has arrived and is waiting, which can be enormous and bursty and full of bots; and the active, the bounded set that has been admitted and holds a valid seat, which is small, steady, and the only group the origin ever sees.
*Inbound is unbounded and the origin never touches it; active is capped at what the origin can serve. The gate, and the rate it runs at, is the entire product.*This split is visible in every published architecture once you look for it. AWS’s reference implementation separates a serving_num (a counter the operator advances) from each visitor’s queue_num (their fixed position), and you become active, eligible to call generate_token for a JWT, only when your queue number falls at or below the serving number. The operator pushes the serving counter forward through a private increment_serving_counter call, and the inlet strategies AWS ships (a fixed-concurrency MaxSize, a clock-driven Periodic, or a Custom controller) are just different policies for how fast to advance it. The active set is whatever the serving counter has reached; the inbound set is everyone with a queue number above it. SeatGeek’s design draws the same line with different words, a “Bouncer” out front that decides whether a request goes to the protected zone or the waiting room, and a token count pinned to available capacity. The token is a JWT in AWS’s case, signed with RSA and verifiable against a published public_key, which means the origin can confirm a visitor is genuinely in the active set without calling back to the waiting room on every request.
Hold that boundary in mind and the bot story falls into place too. The inbound pool is where the bots are, in volume, because the inbound pool is cheap to flood. The active set is where they want to be, because that is where the tickets are. So the entire defensive value of a waiting room is the integrity of the gate between the two, which is why the token is signed, why the cookie is encrypted, why the random draw exists to neutralise timing, and why everything leaks the moment a token can be reused or a seat can be claimed without passing the gate (the failure modes catalogued in why waiting rooms leak). The architecture is not subtle. Almost all of its security rests on one boundary holding.
What the model tells you
Every virtual waiting room you will meet is the same five-part machine wearing different paint. A bucket that caps the active set at origin capacity. A line that holds the overflow. An ordering rule, FIFO when you want to reward patience and random when you need to neutralise a bot’s clock. A signed, expiring token that says you hold a seat. And a gate that moves people from the line into the seats at whatever rate the operator dials in. Cloudflare runs it at the edge as encrypted cookies and Workers state, AWS runs it as Lambda and DynamoDB and RSA-signed JWTs, Queue-it runs it as a connector and an HMAC token, SeatGeek built their own with a Bouncer and a counter, and underneath the model does not change. Once you can name the five parts, every product page reads as a labelling exercise.
The part that ages well, and the part worth carrying away, is that the fairness feature and the bot-defence feature are frequently the same line of code. Random ordering exists because a queue ordered by arrival time hands the front to whoever has the most precise clock, and the most precise clock belongs to a script. Take the value out of being first and you have simultaneously made the queue fairer for humans and worthless to time-racing automation. That equivalence is rare in security, where defences usually cost the defender something the user feels. Here the thing that protects the origin (decoupling admission from arrival speed) is also the thing that makes a stranger’s wait feel fair. The systems that get high-demand on-sales right are mostly the ones that understood the inbound-versus-active boundary is the only line that matters, and spent their engineering on keeping it from leaking rather than on the spinner you stare at while you wait.
Sources & further reading
- Cloudflare (2021), Waiting Room: Random Queueing and Custom Web/Mobile Apps — the October 2021 announcement of random queueing, with the raffle analogy and the bucketId mechanism.
- Cloudflare (2024), Cookies — Cloudflare Waiting Room docs — the
__cfwaitingroomcookie, its documented fields, and the 5-minute / 20-second renewal behaviour. - Cloudflare (2024), Queueing method — Waiting Room reference — the fifo, random, passthrough, and reject queueing methods and what each one does.
- Amazon Web Services (2022), Introducing AWS Virtual Waiting Room — the February 2022 launch post for the serverless reference architecture (since deprecated as a maintained solution).
- AWS Solutions (2024), Virtual Waiting Room on AWS — Developer Guide — the public API endpoints (assign_queue_num, queue_num, serving_num, generate_token), JWT claims, and serving-counter mechanics.
- SeatGeek / AWS Architecture Blog (2021), Build a Virtual Waiting Room with Amazon DynamoDB and AWS Lambda at SeatGeek — the Bouncer/Token Service design, FIFO-or-VIP ordering, and tokens pinned to ticket count.
- Queue-it (2024), How Does Queue-it Work? — the redirect flow, the pre-queue countdown, and FIFO-as-safety-net versus randomised scheduled sales.
- DataDome (2024), What is a Virtual Waiting Room? — a vendor-neutral explainer that separates traffic management from bot mitigation.
- J. Turner (1986), New directions in communications (or which way to the information age?), IEEE Communications Magazine — the original description of the leaky-bucket counter, summarised in Leaky bucket — Wikipedia.
- P. Dordal (2020), Token Bucket Rate Limiting — An Introduction to Computer Networks — the formal token-bucket definition and its mirror-image equivalence to the leaky bucket.
- Macrometa (2023), Improving User Experience and Reliability with Virtual Waiting Rooms — a vendor view of threshold activation and flow-control dequeuing.
Frequently asked questions
What is the difference between a virtual waiting room and a rate limiter?
A rate limiter judges a single client and rejects requests that exceed a window, usually returning a 429 with a Retry-After header, and it keeps no memory of you between attempts. A waiting room instead controls the aggregate, holding the number of active users under a ceiling the origin can serve. Rather than rejecting a request, it parks you, remembers your place, and commits to admitting you eventually in a defined order.
Why do waiting rooms use random ordering instead of strict first-in-first-out?
FIFO rewards arriving early, and on a scheduled drop that becomes a millisecond race that scripts with synchronised clocks win over humans. Random ordering decides each open slot by a draw from everyone currently waiting, so arriving a few milliseconds early buys nothing. Longer time in the pool improves your odds but guarantees nothing. The fairness story and the anti-bot story are the same mechanism, since taking away the value of being first removes what a bot is uniquely good at.
How does the Cloudflare waiting room cookie keep your place in line?
On your first request you receive an encrypted cookie named __cfwaitingroom carrying documented fields such as bucketId (a timestamp rounded to your arrival minute, used for ordering), acceptedAt, refreshIntervalSeconds, and lastCheckInTime. While waiting, its expiry stays at five minutes but is refreshed every twenty seconds as your tab polls, so your spot is kept warm; close the tab and it lapses. Once admitted, the lifetime switches to the configured session_duration. Encryption means you cannot forge or edit it.
Is a waiting room's admission model a token bucket or a leaky bucket?
Both, because they are the same mechanism seen from opposite sides. A leaky bucket is a counter that fills as work arrives and drains at a fixed rate, admitting an arrival only if there is room. A token bucket fills with tokens up to a maximum depth, and you must remove one to do work or else wait. They have identical parameters and decisions. In a waiting room the token is a lease on a seat held for a session, not permission to send one request.
How is the estimated wait time on a waiting room screen calculated?
Cloudflare documents it as the number of visitors ahead of you divided by the rate at which active users are leaving the application and freeing slots. If 14,000 people are ahead and the site drains 1,500 a minute, that is a little over nine minutes. The estimate jumps around because both numbers move. An honest linear estimate during a long wait tends to make patient visitors abandon, which is part of why random ordering, where an exact position does not exist, became attractive for hyped events.
Further reading
DataDome's server-side scoring pipeline: from edge to decision in milliseconds
Traces how DataDome turns an HTTP request into an allow, challenge, or block verdict at the edge: the module-to-API split, the form fields it ships, the regional inference layer, and the latency budget that keeps it synchronous.
·22 min readF5 Distributed Cloud Bot Defense: the architecture after the Shape acquisition
Traces how Shape Security's bot-detection stack became F5 Distributed Cloud Bot Defense: the client-side JavaScript and mobile SDK, the connector model, the telemetry path to the inference engines, and where the system sits in 2026.
·19 min readHow to bypass Queue-it: a field guide for HTTP clients in 2026
What a virtual waiting room actually does, what an HTTP client has to handle to walk through it the way a browser would, and the five layers any client needs to model correctly.
·14 min read