Skip to content

Cloudflare Waiting Room internals: the JWT, the estimated wait, and edge coordination

· 18 min read
Copyright: MIT
Waiting Room wordmark over the __cfwaitingroom cookie field names with an orange accent bar

A waiting room is a strange thing to build at a CDN edge. The whole point of a CDN is that every point of presence acts independently, terminating TLS and serving cache without phoning home. A queue wants the opposite. To decide whether the next visitor goes through or waits, you need a global count of how many people are already inside, and that count lives nowhere in particular. Hundreds of data centers, each admitting users on its own, somehow have to converge on one number: are we over the limit or under it?

Cloudflare ships exactly this, and it runs without a single line of code on the origin. A visitor who hits a protected path during a surge gets a holding page and an estimated wait. A visitor who arrives when there is room sails through and never knows the room exists. The mechanism that decides which of those two things happens to you is the interesting part, and most of it is observable: the cookie name, the configuration knobs, the JSON the edge will hand back if you ask politely. This post walks through what is documented, what Cloudflare’s own engineers have written about the design, and where the public record stops and inference begins.

We start with the product surface, the two numbers you configure and what they actually mean. Then the cookie: its name, what the docs say it holds, and the careful distinction between “encrypted token” and “JWT” that matters more than it looks. Then the estimated-wait computation, which is simpler arithmetic than you would guess. Then the hard part, edge coordination, where Durable Objects do the counting that anycast routing makes difficult. And finally the failure modes, because a queue that admits the wrong number of people is worse than no queue at all.

What you actually configure

Waiting Room exposes a small set of dials, and two of them do almost all the work. Total active users is a target threshold for how many simultaneous sessions you want on the protected pages. New users per minute is a target for the rate at which fresh visitors are admitted. Cloudflare’s configuration reference describes both as targets rather than hard ceilings, and that word is load-bearing. The system aims to keep you near the limit, not exactly at or below it, because exact global enforcement at the edge is not something you get for free.

The two limits answer different questions. Total active users protects against a slow boil: a steady crowd that, summed up, would exceed what your origin can hold. New users per minute protects against a spike: a flash sale or an on-sale where the rate of arrival, not the standing population, is what melts your servers. A session counts as active for the configured session_duration, which ranges from 1 to 30 minutes and defaults to 5. Once admitted, you stay counted as active for that window even if you go idle, and the clock renews on activity.

There is a floor on these numbers. The documented minimum for total active users is above 200, and new users per minute has to be at least 200 and no greater than total active users. The guidance Cloudflare gives is to set new users per minute at roughly 100 percent of your expected peak, so that the queue only engages when real traffic crosses the line rather than throttling a normal day. Set it too low and you queue people who did not need queueing. Set both too high and the room never activates, which is sometimes exactly what you want during a quiet window. That is what the passthrough queueing method is for.

Queueing method itself is the third dial. The default, fifo, orders visitors by when they first hit an actively queueing room. The queueing-methods reference lists three more: random, which picks waiting visitors at random when capacity opens (early arrivals get better odds but no guarantee); passthrough, which lets everyone through and is useful for an event endpoint outside its event window; and reject, which serves a static error and lets nobody in at all. Only some plans can use the methods beyond FIFO. The rest of this post assumes FIFO, because that is the case where ordering, fairness, and the estimated wait all actually matter.

Two limits, two failure shapes incoming traffic new_users_per_minute caps the admission RATE — guards against the spike total_active_users caps the standing POPULATION — guards against the slow boil *The two thresholds answer different questions: one bounds how fast people enter, the other bounds how many are inside at once. Both are targets, not hard ceilings.*

If you have read the Crawlex piece on how virtual waiting rooms work, these are the same token-bucket ideas applied at the CDN layer. New users per minute is a refill rate. Total active users is a population cap. The novelty here is where the buckets live.

When a visitor hits a host-and-path combination with an active waiting room, the edge sets a cookie named __cfwaitingroom. That name is the canonical one in Cloudflare’s cookie reference. If a single room covers multiple routes through additional_routes, you must give it a cookie_suffix, and the cookie name becomes __cfwaitingroom_<suffix>, scoping the queue state per route group so two rooms on the same domain do not collide.

Here is where I want to be precise, because the slug of this post says “the JWT” and the documentation does not. Cloudflare’s docs describe the cookie as encrypted to prevent users from modifying it. They do not, anywhere I could find, call it a JWT. Those are not the same claim. A standard signed JWT is readable by anyone: base64url payload in the clear, a signature that only stops tampering, not reading. If Waiting Room used a plain signed JWT, you could paste the cookie into jwt.io and watch the queue’s internal fields scroll by. The fact that the docs stress encrypted and cannot be modified, and that the cookie does not decode as a readable JWT in practice, points to an encrypted token, something closer to JWE (JSON Web Encryption) or a Cloudflare-internal authenticated-encryption format rather than a transparent JWT. The exact wire format is not publicly documented. What follows about the contents is drawn from Cloudflare’s own engineering blog and the cookie docs, not from decoding a live token, and I flag where that line sits.

So what is inside? The documentation names two fields directly. There is a group ID corresponding to the minute the visitor entered the waiting room, and an acceptedAt value corresponding to the minute the visitor entered the application. Cloudflare’s 2021 engineering post on building the system, written by the team that built it, fills in a little more: it describes the encrypted cookie carrying a bucketId (the timestamp-based cluster used for position), an acceptedAt admission timestamp, and a lastCheckInTime for the most recent activity. The “group ID” of the docs and the bucketId of the engineering post are the same idea under two names, a one-minute bucket stamp that says when you arrived. The exact JSON key names in the current production token are not something the public docs commit to, so treat bucketId versus group ID as a naming question, not a contradiction.

__cfwaitingroom encrypted; not a readable JWT. Field names below are documented or vendor-stated. bucketId / group ID the one-minute bucket you arrived in — drives FIFO position acceptedAt the minute you were admitted to the application lastCheckInTime most recent activity — renews the 5-minute expiry Expiry held at 5 min, refreshed every 20s while the tab is open. *The documented and vendor-stated fields. The exact production key names and the encryption scheme are not in the public docs; this is the shape, not a decode.*

The lifecycle is the cleverer part. While you sit in the room, the cookie’s expiration is always set to five minutes, but the browser refreshes every twenty seconds and each refresh renews that five-minute window. Close the tab and the cookie quietly dies inside five minutes, which frees your slot. Once you are admitted to the application, the cookie’s lifetime switches to session_duration (that 1-to-30-minute setting), and your seat is held for that long after your last activity. The whole admission-and-hold dance is encoded in a cookie that the client carries and the edge re-reads on every request, which is what lets any data center make the decision locally. No central session store to look up. The token is the state.

Cookie attributes are configurable. The Waiting Room cookie supports samesite and secure settings, with auto defaults that let Cloudflare pick sensible values based on the zone. If you have looked at the cf_clearance cookie, this is the same posture: an opaque, integrity-protected token the visitor must present, set with conservative cross-site attributes so it survives the redirect dance without leaking across contexts.

I will say the obvious defensive thing once and not belabor it. Because the token is the state and it is encrypted by Cloudflare, you cannot forge a position or fast-forward your acceptedAt by editing the cookie. Tampering invalidates it and you land back at the room. The interesting attacks on systems like this are never about editing the token; they are about the gap between the global limit and what each edge node can actually enforce, which is the subject of the coordination section below and of why waiting rooms leak.

How the estimated wait is computed

The number that scrolls on the holding page looks like it should require a model. It does not. The About page gives the formula in one line: visitors ahead of you divided by the rate at which the application is admitting users. If 10,000 people are in front of you and the application is letting 1,000 in per minute, your estimate is ten minutes. That is the entire calculation.

The two inputs both come from the cookie scheme. The bucket stamp, the minute you arrived, lets the system count how many earlier buckets are still waiting, which gives “visitors ahead.” The acceptedAt stamps on the people leaving the room tell the system how many are graduating into the application per minute, which gives the admission rate. Divide one by the other. Because everyone in your one-minute bucket shares a stamp, the system treats a bucket as a unit, which is why the estimate moves in steps rather than ticking down smoothly. The page refreshes the figure every twenty seconds; you watch the wait drop a bucket at a time as the rooms ahead of you drain.

You do not have to render the holding page to see this machinery. Set json_response_enabled and send a request with the header Accept: application/json (and it must match exactly; Accept: application/json, text/html will not trigger it), and the edge returns a small object instead of HTML. The JSON-response guide documents the shape. The top-level key is cfWaitingRoom, and inside it the fields that matter are inWaitingRoom (true while you wait), waitTime (the estimate in minutes), waitTimeKnown (whether that estimate is trustworthy yet), waitTimeFormatted (the human string), queueIsFull, queueAll, lastUpdated (an ISO 8601 timestamp), and refreshIntervalSeconds, which is the 20 you would otherwise infer from watching the page reload.

GET /  Accept: application/json → the edge answers with JSON, not HTML { "cfWaitingRoom": { "inWaitingRoom": true, "waitTime": 10, "waitTimeKnown": true, "waitTimeFormatted": "10 minutes", "queueIsFull": false, "queueAll": false, "lastUpdated": "2020-08-03T23:46:00.000Z", "refreshIntervalSeconds": 20 } } *The documented cfWaitingRoom object. This is the same data the holding page renders, exposed for mobile apps and non-browser clients.*

Two of those fields deserve a note. waitTimeKnown exists because the estimate is not always meaningful. Early in an event, before enough people have graduated to establish an admission rate, the denominator is noise and the system says so rather than printing a fake number. And queueAll reflects the queue_all event behavior, where every visitor is held regardless of capacity. That suits a scheduled on-sale where you want a clean starting gun rather than a trickle of early arrivals who happened to load the page first.

Edge coordination: counting without a center

Now the genuinely hard problem. Anycast routing sends each visitor to whichever Cloudflare data center is closest, and there are well over 300 of them. Each one is admitting users independently against a global limit. If every data center naively let in up to total_active_users, the real population would be that number times 300, and the origin would die exactly as it would have without a queue. So the global limit has to be split, and split well, across locations whose individual traffic shares change minute to minute.

Cloudflare’s design uses Durable Objects, which are single-instance, strongly-consistent stateful actors on the Workers platform. The 2021 engineering post lays out a hierarchy. At the top is a global Durable Object that aggregates state from everywhere and holds the authoritative picture of how many users are in the room worldwide, backed up to the Cache API so a restart does not lose the count. Below it, the 2023 follow-up by George Thomas describes per-location counter Durable Objects, “durable object instances that do counting for a set of workers in the data center.” Workers at the edge do not each carry a pre-divided slice of the limit; they ask the local counter for a number, and admission happens when the number a visitor draws is below the slots available to that data center.

The allocation across data centers is the clever bit, and it is explicitly historical. Slots are handed to each location in proportion to its share of recent traffic. The 2021 post describes using the distribution from two minutes prior; the 2023 post frames it as the previous interval’s distribution. A location that was carrying 8 percent of traffic gets roughly 8 percent of the global slots. The lag is deliberate: you cannot allocate on this exact instant’s traffic because you do not know it yet, so you allocate on a recent, settled picture and accept that it will be slightly stale. Traffic patterns at minute granularity are sticky enough that a two-minute-old distribution is a good predictor of the next minute.

Counting a global population from independent edges Global DO authoritative count, backed to Cache DC counter DC counter DC counter workers (isolates) ask the counter for a number Slots are split per data center on traffic from ~2 minutes prior; "anywhere" slots absorb the unexpected. *The hierarchy that turns hundreds of independent edges into one count. Workers consult a local counter; counters roll up to a global object; allocation tracks recent traffic share.*

What about traffic from a location that the historical distribution did not predict, a region that was quiet two minutes ago and suddenly is not? The 2023 post calls these anywhere slots: a reserve held back from the per-location allocation to absorb arrivals from unexpected places. In the worked example from that post, the reserve is described as 75 percent of a remaining pool of 150 slots, set aside precisely so a geographic surprise does not get rejected just because it was not in the forecast. It is slack in the system, the cost of running a queue on a network where you cannot perfectly predict where the next request lands.

Underneath all of this is a write-rate problem. A Durable Object is strongly consistent, which means it is a serialization point, which means hammering it from thousands of workers on every request would make it the bottleneck. Cloudflare’s answer is to keep workers reading from the much faster Cache API for recent state and only periodically reconciling with the data center counter, adjusting how often they write based on load. Workers hold an in-memory view, update it locally as they admit users, and flush to the counter at an adaptive rate rather than on every decision. The consequence is that the count is eventually consistent in the small and strongly consistent in the large. Any single worker’s view is slightly behind, but the system as a whole converges, and the limit is a target hit on average rather than a hard wall hit exactly. That is the price of doing this at the edge instead of at a central database, and the 2021 post is candid that they considered and rejected both Workers KV (too write-heavy, too eventually consistent for spike response) and a central Redis-style store (operational overhead and a single point of failure).

If you want the contrasting architecture, Queue-it runs the queue as a separate hosted service the visitor is redirected to, and AWS’s reference architecture builds it from your own serverless components. Cloudflare’s bet is that putting the queue inside the same network that already terminates the request removes a hop, removes origin code, and removes a piece of infrastructure you would otherwise operate. Akamai’s edge approach lands in a similar place from a different starting point.

Why “target” is the most important word

A queue exists to enforce a limit, so the honest reckoning is how close to that limit the system actually keeps you. The answer the architecture forces is: close, on average, with bounded overshoot. Every design choice above trades exactness for the ability to run at the edge. Historical slot allocation means each data center is admitting against a slightly stale share. Adaptive write rates mean each worker is deciding against a slightly stale local count. Anywhere slots mean there is deliberate headroom for surprises. None of these is a bug; together they are why the documentation calls the limits targets and means it.

That word also explains the shape of the failure modes. Overshoot, admitting more than total_active_users, comes from the lag between when a worker admits someone and when that admission propagates to the global count. During a sharp spike, many workers can each be a little behind, and a little times a lot of workers is a real number of extra sessions. Undershoot, queueing people when there was room, comes from the same staleness pointing the other way, plus the conservatism of the reserve. The system errs toward overshoot being small and recoverable because the alternative, hard global enforcement, would mean every admission round-trips to a central authority and the latency would defeat the point. A queue that adds 200 milliseconds to every request to be exactly correct is worse, for a real on-sale, than one that runs at the edge and is correct within a few percent. The classic ways these systems leak (token reuse, races between admit and count, replaying a captured cookie before it expires) all live in exactly this gap, and the field-wide write-up on queue leaks catalogs them across vendors.

The piece I keep coming back to is that the entire user-facing contract — your position, your estimated wait, your admitted session — rides in one encrypted cookie that the client carries and any of 300-plus data centers can read and trust without a lookup. That is what makes the whole thing edge-native. The cookie is not a pointer to server state; it is the state, sealed so you cannot rewrite it, stamped with the minute you arrived so the math of who-goes-next is just arithmetic on timestamps. The hard distributed-systems work is hidden behind that simplicity, in the Durable Objects quietly agreeing on a number that has no single home. Cloudflare published enough about the design to follow the shape of it. The exact bytes of the token, they kept to themselves, and on a system whose only job is to be unforgeable, that is the right call.


Sources & further reading

  • Batraski, B. (2021), Cloudflare Waiting Room — the launch announcement, motivation (vaccine sign-up surges, Project Fair Shot), and the edge-native, no-origin-code design goal.
  • Semeria, F., Thomas, G., and Jacob, M. (2021), Building Waiting Room on Workers and Durable Objects — the architecture post: the global and data-center Durable Objects, the encrypted cookie fields, and why KV and a central database were rejected.
  • Thomas, G. (2023), How Waiting Room makes queueing decisions on Cloudflare’s highly distributed network — per-data-center counter objects, historical slot allocation, and the “anywhere slots” reserve.
  • Cloudflare (2026), Configuration settings — total active users, new users per minute, session duration, and their documented ranges and minimums.
  • Cloudflare (2026), Cookies — the __cfwaitingroom name, the cookie_suffix rule, encryption, the group-ID and acceptedAt fields, and the 5-minute / 20-second expiry behavior.
  • Cloudflare (2026), About — the estimated-wait formula (visitors ahead divided by admission rate) and the FIFO outflow model.
  • Cloudflare (2026), Queueing method — the fifo, random, passthrough, and reject methods and what each does.
  • Cloudflare (2026), Get JSON response for mobile and other non-browser traffic — the cfWaitingRoom object and its fields, plus the exact-match Accept: application/json requirement.
  • Cloudflare (2026), Waiting Room Analytics — how the configured limits map to observed queue behavior and what to adjust when wait times run long.
  • Cloudflare (2022), Durable Objects — now Generally Available — the strongly-consistent stateful primitive Waiting Room’s coordination is built on.

Further reading