Skip to content

Why waiting rooms leak: race conditions and token reuse in queue systems

· 22 min read
Copyright: MIT
JetBrains Mono wordmark reading 'waiting rooms leak' with an orange underline bar

A virtual waiting room makes one promise. When demand for a page exceeds what the origin can serve, the room holds everyone in a fair line, lets them through at a controlled rate, and hands each cleared visitor a signed proof that says: this person waited, let them in. The proof is the whole game. If the proof can be copied, replayed, minted in bulk, or won twice off a single race, the line stops being a line. It becomes a suggestion that honest people follow and motivated people walk past.

That is the question this post sits with. Not how to walk past one (the field guides for that already exist, and a working bypass is not what a defensive reference should publish), but why the leak exists in the first place. The proof is a token. Tokens have a structure, a lifetime, and a validation step, and every one of those three has a way to go wrong. The interesting failures are not exotic. They are the same ones authentication systems have shipped for twenty years: a check that happens before a use, a credential bound to nothing, a counter that updates a beat too late.

What follows walks the token lifecycle from issuance to validation and stops at each place it can spill. First, what the token actually is across the major systems, with their real field names. Then token reuse and replay: the same proof spent twice. Then the race conditions at admission, where concurrency beats the single-use check. Then the multi-tab and multi-session arithmetic that quietly multiplies one fair slot into many. Then how scalpers chain these together, and what the systems do to close each gap. The through-line is that a queue is only as fair as its weakest binding, and most of the weak bindings are economic decisions, not bugs.

What the token actually is

Every waiting room issues a proof, and across the four systems most people meet, the proof is some flavour of signed blob. The names and the crypto differ. The shape does not.

Cloudflare’s Waiting Room sets a cookie named __cfwaitingroom. Per Cloudflare’s own docs, that cookie is encrypted to prevent modification by users, and it carries a unique group ID corresponding to the minute the visitor entered the room plus an acceptedAt value corresponding to the minute they entered the application. While a visitor sits in the room the cookie expires after five minutes but renews every twenty seconds automatically, as long as the tab stays open. Once admitted, the cookie’s lifetime is set by the room’s session_duration. The exact internal layout beyond those documented fields is not public; Cloudflare describes the cookie as encrypted and does not publish its claim set, so anything finer than “group ID plus acceptedAt timestamp, encrypted at the edge” is inference, not documentation.

Queue-it’s token is the most openly specified of the group, because the QueueToken.V1 reference implementation is on GitHub. The token is three Base64Url-encoded parts joined by dots, in the shape [header].[payload].[hash]. The header declares typ as QT1, enc as AES256, and carries iss and exp timestamps, a unique token identifier ti (a UUID), a customer ID c, and optional e (event ID), ip, and xff fields. The payload, AES-256 encrypted with the customer’s secret key, holds r (a relative quality score), k (a unique key for the integrating system), and cd (custom key-value data). The whole token is signed with SHA-256 over the secret key. On the redirect back from the queue, a separate queueittoken query-string parameter rides the URL with its own short fields: q for the queue identifier, ts for a validity timestamp, and h for the hash. The connector recomputes h with HMAC-SHA256 over the customer’s 72-character secret key; if ts has passed or h does not match, the visitor goes back to the queue.

Queue-it token: [header].[payload].[hash] header (clear) typ=QT1 enc=AES256 iss exp ti c e ip xff payload (AES-256) r quality score k integ. key cd custom data hash (SHA-256) signs header+payload keyed by the customer secret Tamper the header or payload and the hash no longer verifies. The leak is never forgery. *The token's integrity is solid: the signature is keyed by a secret the client never sees. Every interesting failure mode lives somewhere other than the crypto.*

The AWS Virtual Waiting Room solution, the open-source serverless reference architecture, issues a plain JWT. Its developer guide lists the claims: aud is the event ID, sub is the request ID, iss is the issuer URL, plus queue_position, token_use (access, id, or refresh), and the usual iat, nbf, and exp timestamps. It signs with RS256 using a key pair generated at install time and held in Secrets Manager, exposing only the public half through a /public_key JWK endpoint. The signing is asymmetric, which matters later: the origin can verify a token without holding anything that could mint one.

The pattern is the same in all three. A signed envelope, a body of claims, a secret the client never holds. The signature is not the weak point. You cannot forge a Queue-it h without the 72-character key, and you cannot forge an AWS token without the RS256 private half. The crypto does its job. The leaks come from everything the crypto does not promise: that a token will be spent once, by the holder it was issued to, on one request at a time. If you want the friendly version of how these tokens get issued in the first place, the companion pieces on how virtual waiting rooms work and Queue-it’s architecture lay out the issuance path. This post is about what the signature does not cover.

Reuse and replay: spending one proof twice

A signature proves a token is authentic. It says nothing about whether the token has already been used. That gap is the oldest and simplest leak.

Consider what a valid admission token is, stripped down: a bearer credential with an expiry. Bearer means whoever holds it is treated as the holder. If the token lives in a cookie or a URL parameter and the server checks only that the signature verifies and the clock has not run out, then the token is a transferable ticket. Copy it to a second client, a second machine, a second person, and the second holder is admitted as if they had waited. Within its validity window it works as many times as it is presented, because nothing on the server remembers it was already spent.

This is replay, and the defence is well understood: bind the token to something the holder cannot trivially carry across the copy, and remember that it was used. Both halves are harder than they sound.

The binding problem first. A waiting room could bind its token to an IP address, and some do as a configuration option, but IP binding fights the real world. Mobile carriers rotate addresses mid-session. Corporate and university NAT puts thousands of legitimate users behind one address. Bind too tight and you generate support tickets from honest users whose carrier shuffled their IP between the queue and the checkout. Cloudflare’s docs make the looseness explicit in a different way: a visitor viewing the room inside an iframe and outside it is counted as two separate users, and a session that lapses without an active queue is given a fresh cookie and counted as a new user again. The cookie is the identity. The cookie is also, by construction, a thing the browser hands back on every request and a thing a script can read out of document.cookie unless it is flagged HttpOnly. A credential whose entire identity is “holds this cookie” is a credential that travels.

The remembering problem is worse, because it collides with scale. To detect replay you keep a record of spent tokens and reject the second presentation. At ticket-sale volume that record is enormous and hot. Queue-it’s token carries a ti, a per-token UUID, precisely so a backend can track and constrain a session after the queue. AWS goes further by design: the reference architecture stores a copy of every issued token in DynamoDB keyed by request ID, so the same request ID always returns the same token rather than minting a fresh one, and a serving step can mark a token consumed. But “store every token and check it on every admission” is a synchronous read-and-write on the hottest path of the highest-traffic minute the site will ever see. The temptation to make that check eventually consistent, or to skip it for performance, is exactly where reuse windows open. A token that is single-use eventually is multi-use now.

One token, validity window 0–t. Spent-token record lags by δ. 0 t (exp) client A client B client C admitted admitted admitted If the "already spent" write propagates slower than the copies arrive, every copy verifies. *Replay is not a crypto failure. The signature verifies every time, which is the whole problem: validity and freshness are different questions, and only one of them is cheap to answer at scale.*

There is a subtler replay surface in the redirect token itself. Queue-it’s queueittoken rides the URL as a query string on the 302 back to the origin. URLs end up in browser history, in Referer headers sent to third-party assets, in CDN access logs, in analytics payloads, and in any logging middleware that records full request lines. A redirect token with a generous ts window that leaks into a log is a replayable admission for as long as that window lasts. This is why the validity timestamp on the redirect token is short and why the connector strips the token from the URL before doing anything else; the KnownUser libraries do exactly that, rewriting the URL to remove the token before matching triggers. Short windows shrink the replay surface. They do not remove it.

Race conditions at admission

Replay is about spending a token after it should be dead. The race condition is meaner: it spends a token multiple times before the system has finished deciding it was spent even once.

The pattern is time-of-check-to-time-of-use, and it is the same defect that shows up in file systems, payment flows, and token rotation everywhere. The system checks a condition (this token is unused, this request ID has no token, this user is under their purchase limit), then acts on that check (issue a token, admit the user, allow the purchase). Between the check and the use there is a window. If two requests both pass the check before either completes its use, both proceed, and the invariant the check was protecting is now violated. The counter said one. Two got through.

A token-rotation example outside the waiting-room world shows the shape cleanly. Keycloak shipped exactly this defect, tracked as CVE-2026-1035: a TOCTOU race in the TokenManager refresh-token handling where, by sending concurrent requests, a single refresh token could be exchanged for multiple valid access tokens before the usage counter updated. Refresh-token rotation is supposed to be strictly single-use when refreshTokenMaxReuse is zero. Concurrency beat the counter. One refresh token, several access tokens, all valid. Swap “refresh token” for “admission token” and “access token” for “checkout session” and you have the waiting-room version exactly.

Time-of-check-to-time-of-use at admission request 1 request 2 check: unused check: unused use + mark spent use + mark spent both reads land before either write → both admitted *The fix is to make check-and-mark a single atomic step, not two. Stated that way it sounds trivial. At scale, atomic-on-the-hot-path is the expensive thing the architecture was built to avoid.*

The waiting room is unusually exposed to this class because its whole job is to absorb a concurrency spike. The moment the room admits a batch is the moment thousands of clients are hammering the admission endpoint in parallel. That is the worst possible environment for a check-then-act pattern, and it is the environment the system is guaranteed to be in. An attacker does not need to engineer the concurrency. The on-sale creates it for free; the attacker just needs to fire their copies into the same millisecond.

The clean fix is to make admission a single atomic operation. Read-and-mark in one conditional write, so the database itself enforces that only the first of N concurrent presentations wins. AWS’s idempotent token generation is the architecture-level version: because the GenerateToken step keys off the request ID in DynamoDB and returns the existing token if one is found, two concurrent generations for the same request ID converge on one token instead of racing to mint two. DynamoDB’s conditional writes give you the atomic primitive; the design has to actually use it on every state transition, not just the convenient ones. The places that leak are the transitions where someone reached for a read, then a write, because it was simpler, and the load test never hit the exact interleaving that breaks it.

Purchase limits are the same defect wearing different clothes. The room admits you fairly, you reach checkout, and the rule is two tickets per account. If the limit is enforced as read-count-then-increment rather than an atomic decrement of a reserved budget, then firing several checkout requests at once can let all of them read “count is zero” before any writes “count is two.” The queue did its job. The limit downstream did not, and the net effect is the same as a queue leak: one identity walking away with more than its share.

Multi-tab and multi-session amplification

The first two failures are bugs in the strict sense. The third is mostly not a bug at all. It is arithmetic, and it is the one that does the most quiet damage.

A waiting room counts identities. Its idea of an identity is whatever it can cheaply observe: a cookie, a token, a request ID, sometimes an IP. None of those is a person. The gap between “identity the system counts” and “human being in the world” is the amplification factor, and a determined actor’s entire job is to widen it.

Start with the most innocent version. You open a sale in three browser tabs because you are nervous and want to maximise your odds. In Cloudflare’s model the __cfwaitingroom cookie is scoped to the host and path, so all three tabs share one cookie and one queue position, and each request refreshes the same five-minute window. Three tabs, one identity. The system counts you once. That is the design working. But notice the assumption it rests on: that all three tabs share cookie storage. Open the sale in three different browsers, or three browser profiles, or one normal and one incognito window, and the cookie jars are separate. Now you are three identities holding three queue positions, and the system counts you three times, because from where it sits you are three visitors. Nothing was tampered with. No signature was forged. The room counted exactly what it could see.

One human, N counted identities one person 3 tabs, shared cookie jar → 1 queue position 3 profiles / incognito → 3 positions N devices + N IPs → N positions The cookie is the identity. Multiply cookie jars and you multiply identities, no forgery needed. *The honest user with three tabs and the operator with three thousand sessions are doing the same thing. The only difference is N, and the system has no clean way to tell them apart on signal alone.*

Now scale N. A scalper does not open three tabs. They run hundreds or thousands of isolated sessions, each with its own cookie jar, ideally each behind its own residential IP so the rooms that do glance at IP reputation see a different origin per session. Each session is a fully legitimate queue entry. Each waits its turn honestly. The amplification is purely in the count: one operator occupying a thousand fair positions means a thousand honest humans are a thousand places further back. No single session does anything wrong. The unfairness is entirely emergent from the multiplicity, which is exactly why it is so hard to police. There is no malicious request to block. There is only a distribution of identities that does not match the distribution of humans.

The numbers people report make the scale concrete. Queue-it and other operators have said bots can be a majority of traffic during high-demand sales, with one widely cited figure of 96 percent of requests in a high-profile on-sale coming from bots and uninvited visitors, leaving roughly 4 percent (about 138,000 of 3.3 million requests) as trusted human traffic. Treat the exact percentage as a vendor-reported headline rather than an audited measurement, but the order of magnitude is consistent across sources: at the top of the demand curve, most of the queue is not people.

This is why the better systems stopped trying to count cookies and moved the identity upstream of the room entirely. Queue-it’s bots and abuse documentation describes an enqueue token whose purpose is to carry a verified visitor-identification key, account ID, email, or a personally distributed code, from the application into the queue, and to enforce uniqueness so that each ID gets exactly one place in the room. Require an enqueue token to enter and visitors cannot share entry with others, because the scarce thing is no longer the cookie (free, infinite) but the verified identity (issued, rate-limited, tied to a login). The Ticketmaster Verified Fan approach is the same idea taken to its logical end: pre-register humans, hand the queue a list of vetted identities, and let the room count those instead of counting browsers. Once the unit of scarcity is a verified human rather than a cookie, the multi-session amplification collapses, because minting a thousand fresh cookies is trivial and minting a thousand vetted identities is not.

How scalpers chain the gaps

No real operation relies on a single trick. The economically serious actors compose the failures above into a pipeline, and the composition is where individually small gaps become a large one.

The pipeline starts before the room. If the application exposes the protected resource through an API that does not itself check the admission token, the room is decorative. This is the oldest mistake in the catalogue: the JavaScript front end gets queued, but the backend POST that actually reserves the ticket trusts the front end and skips its own validation. Queue-it’s own abuse guidance is blunt that client-side-only integration is vulnerable to visitors manipulating the JavaScript to skip the line, which is why the server-side and edge connectors exist. The KnownUser libraries put an HMAC-SHA256 check on the request path with the customer’s 72-character secret, so the validation happens at the origin or the CDN edge rather than in script the client controls. When that check is present and on the real resource endpoint, the room holds. When it is bolted onto the page render but not the purchase API, the room is theatre. For the longer treatment of where these checks sit across the integration modes, the bypass field guide maps the territory.

Assume the check is present. The operator then attacks the count, not the crypto. They pre-build a fleet of sessions, each with its own cookie jar and its own IP, and enter all of them into the queue early. They are not bypassing anything. They are buying a thousand lottery tickets. The room admits at a fair rate, but a disproportionate share of the admitted are theirs, because a disproportionate share of the line was theirs. This is the amplification failure, industrialised, and it is the workhorse of modern scalping precisely because it does not require defeating any signature. The defences that bite here are the ones that raise the cost of an identity: proof-of-work to tax cheap session creation (Queue-it’s Traffic Access Rules can answer a segment with a proof-of-work challenge or a CAPTCHA), residential-proxy and ASN reputation to make per-session IPs expensive (covered in the residential proxy detection piece), and the enqueue-token identity binding that makes the scarce unit a verified human.

The race condition is the finisher. Once admitted, the operator’s sessions converge on the checkout, and if the purchase-limit enforcement is a check-then-increment rather than an atomic reservation, firing the reservations concurrently can punch past the per-account cap before the counter catches up. This is the same TOCTOU defect from the admission discussion, now applied to the limit rather than the token. It is also the moment that pays best, because it is where a fair admission turns into an unfair quantity. The operator waited honestly, was admitted honestly, and then took twelve where the rule said two, because the rule was enforced with a read and a write instead of one atomic decrement.

What ties the pipeline together is that each stage exploits a different broken promise. The missing API check breaks “the proof is required.” The multi-session fleet breaks “the count reflects humans.” The checkout race breaks “the limit holds under concurrency.” A system can be solid on two of the three and still leak through the third, which is why partial hardening so often disappoints. You close the replay window and the operators shrug and lean harder on multiplicity. You add IP reputation and they buy cleaner proxies. You fix the checkout race and they go back to simply occupying more of the line. The queue’s fairness is a conjunction, and a conjunction is only as true as its falsest term.

Closing the gaps, and what stays open

The defensive moves are not mysterious, and the honest summary is that the strong ones all push in the same direction: make the unit of scarcity expensive, and make every state transition atomic.

Atomicity is the cleaner half. Admission and limit checks should be single conditional writes, so the datastore is the thing that enforces single-use, not application logic that reads then writes with a window in between. The pattern that survives a concurrency spike is the one where the first of N racing requests wins at the storage layer and the rest get a clean rejection, with no interleaving that lets two pass the same check. AWS’s request-ID-keyed idempotency and DynamoDB conditional writes are a worked example; the same discipline applies to a purchase counter or a per-identity admission flag. It costs latency on the hottest path, which is exactly why it gets cut, and exactly why cutting it leaks. The deeper material on building a queue that holds under this load is in the fair-queue-at-scale piece.

Identity is the harder half, because it is a product decision dressed as a security one. As long as the unit a room counts is a cookie, the room is counting something free and infinite, and multiplicity will always win. Moving the scarce unit upstream, to a verified email, a logged-in account, a pre-registered fan, a one-per-person enqueue token, is the only move that actually changes the arithmetic, and it is the move that costs conversion, annoys legitimate users who do not want to register, and shifts the fight to identity fraud instead of session fraud. Vendors know this. It is why the enqueue-token and verified-fan mechanisms exist and why they are not turned on for every sale: the friction is real, and most operators only pay it for the sales where the scalping is bad enough to justify the lost honest conversions.

So the gap does not fully close. The crypto on these tokens is sound, and it was never the problem. The problem is that a signature certifies authenticity and says nothing about freshness, holder, or count, and supplying those three properties cheaply, at the exact moment of peak concurrency, is genuinely hard. The systems that hold best in 2026 are the ones that decided the cookie is not an identity and that the database, not the application, owns single-use. The ones that leak are usually not running weak crypto. They are trusting a bearer token to behave like a person, and then acting surprised when one motivated person behaves like a thousand of them.


Sources & further reading

Further reading