Skip to content

AWS Virtual Waiting Room: the serverless reference architecture deconstructed

· 23 min read
Copyright: MIT
JetBrains Mono wordmark reading AWS Virtual Waiting Room with an orange queue-position bar underneath

Most virtual waiting rooms are black boxes. You hit a ticket on-sale, get bounced to a holding page, watch a number tick down, and eventually land on the site with a token in your cookies that you never see. Queue-it, Cloudflare, Akamai, and the rest ship the queue as a managed service and keep the internals to themselves. So when you want to understand how one of these things is actually built, you mostly have to infer from the outside.

AWS did something unusual: it published one. The Virtual Waiting Room on AWS solution shipped in February 2022 as a complete, open-source, deploy-it-yourself reference architecture, source and CloudFormation templates and all. It is the one waiting room whose token format, queue-counter logic, and API surface you can read line by line instead of reverse-engineering from traffic. That makes it the best teaching artifact in the category, even now that AWS has deprecated it. This post walks the architecture and pulls out what it teaches about building a queue.

The route through: what the solution is and where it stands in 2026, the two-API split that defines the design, how a visitor gets a queue position through an SQS buffer, how the serving counter advances and who advances it, the structure of the JWT the system hands you, how the Lambda authorizer checks it, the EventBridge seam that makes the whole thing extensible, and finally the lighter CloudFront Functions pattern AWS now points people to instead.

What the solution is, and its status in 2026

Virtual Waiting Room on AWS is a “site wrapper” you deploy into your own account to absorb and control traffic during a large-scale event, a ticket on-sale, a product drop, a registration window. It sits in front of a smaller origin and meters visitors into it at a rate the origin can survive. The authors of the launch post, Justin Pirtle, Joan Morgan, and Jim Thario, framed it as three cooperating pieces: a set of core APIs, a waiting-room front-end that shows your place in line, and a Lambda authorizer that gates the protected origin until you hold a valid token.

The honest part first. The solution is deprecated. The GitHub repository, aws-solutions/virtual-waiting-room-on-aws, was archived on 4 November 2025, with the final release tagged v1.1.15 on 17 September 2025. The AWS Solutions landing page now reads “This AWS Solution is no longer available,” and AWS points customers to enterprise alternatives in AWS Marketplace or to a lighter do-it-yourself pattern built on CloudFront Functions. The code is Apache-2.0 and remains readable in the archived repository, which is why it is still worth deconstructing. As a reference architecture it did not stop being instructive when AWS stopped maintaining it.

What you get when you deploy it is a stack of nested CloudFormation templates, the main one named virtual-waiting-room-on-aws.json, plus templates for the authorizers, an OpenID adapter, and sample inlet strategies. The source tree breaks into modules you can map directly onto the architecture: core-api, token-authorizer, core-api-authorizers-sample, openid-waitingroom, sample-inlet-strategies, control-panel, sample-waiting-room-site, and a shared module. The whole thing is serverless in the AWS-marketing sense, no servers you patch, but it is not stateless. State lives in DynamoDB and in an ElastiCache for Redis cluster inside a VPC, and that split is the first design decision worth understanding.

If you want the conceptual background on what a queue position and a fairness guarantee even mean before getting into the AWS-specific plumbing, how virtual waiting rooms work covers token buckets and fair ordering at the pattern level. This post assumes you already know why you’d want one.

The two-API split

The core of the system is two separate API Gateway deployments. One is public, fronted by CloudFront and open to the visitor’s browser. The other is private, authorized with IAM, and meant only for your own backend, your control panel, your automation. Keeping these on different deployments is the single most important structural choice in the whole design, because the public API and the private API have completely different threat models. Anyone on the internet can call the public API. Only your AWS principals can call the private one.

Two API deployments, one shared core Visitor browser (CloudFront) Public API (open) assign_queue_num · queue_num serving_num · generate_token public_key · queue_pos_expiry Your backend / control panel Private API (IAM) increment_serving_counter num_active_tokens · update_session reset_initial_state Shared core state ElastiCache (Redis) · DynamoDB · Secrets Manager *The public API takes anonymous queue traffic from the browser through CloudFront; the private API is IAM-only and used by your own automation to advance the serving counter and inspect token state. Both read and write the same Redis counters and DynamoDB tables.*

The public API exposes the endpoints a visitor’s browser actually touches. The named operations are /assign_queue_num to get into line, /queue_num to read back your assigned position, /serving_num to read the current serving position, /waiting_num for the count of people still queued, /generate_token to exchange a served position for a JWT, /public_key to fetch the RSA public key for verifying that JWT, and /queue_pos_expiry to ask how many seconds remain before your position lapses. Everything a client needs to walk the queue is in that list, and nothing else.

The private API holds the operations that would be dangerous in anonymous hands. /increment_serving_counter moves the serving position forward (or backward, it accepts a signed increment_by). /num_active_tokens counts tokens whose exp is still in the future. /expired_tokens lists the request IDs whose tokens have lapsed. /update_session marks a session completed or abandoned. /reset_initial_state tears down and recreates the DynamoDB table for a fresh event. There is also a private /generate_token variant that lets your backend override claims, which the public one does not. The asymmetry is the point: the rate at which people leave the queue is a privileged decision, never something a visitor gets to make for themselves.

Getting a place in line

Here is where the serverless design earns its keep. When a visitor asks to enter, the browser POSTs to /assign_queue_num with the event ID. The naive implementation would invoke a Lambda per request, increment a counter, and return a position. Under the traffic these systems exist to handle, that is precisely the failure mode you are trying to avoid: a thundering herd of cold-start Lambda invocations all contending on the same counter at the same millisecond.

So the solution puts an SQS queue between the request and the work. The /assign_queue_num call drops a message onto SQS and returns a request ID immediately. The visitor does not get their numeric position back synchronously. A separate Lambda drains SQS in batches and assigns the actual queue numbers, incrementing the queue counter in Redis once per batch rather than once per request. The browser then polls /queue_num with its request ID until its position is ready. AWS’s own description is blunt about why: the SQS queue “batches the incoming bursts of requests” instead of invoking the Lambda function for each one. You trade synchronous response for the ability to absorb a spike without melting the counter.

Browser POST /assign API + Lambda enqueue only SQS buffers burst Queue Lambda batch → Redis incr then polls GET /queue_num *The enqueue path returns a request ID synchronously and does the real position assignment asynchronously off SQS, so a traffic spike lands in a buffer instead of as N concurrent counter writes. The browser learns its number by polling, not from the original response.*

The state itself is split deliberately. The hot, high-throughput counters live in Redis: the queue counter (how many have entered), the serving counter (how far the line has been let through), and a token counter. Redis handles atomic increments at a rate DynamoDB on-demand would throttle or charge heavily for. DynamoDB holds the durable records, the per-request queue position with its entry time and status, the serving-counter history, and the issued-token metadata keyed by request ID. The exact table schemas in the deployed stack are not laid out field-by-field in the public docs, but the developer guide and community reconstructions describe roughly four tables covering queue-position-by-entry-time, serving-counter-issued-at, the token table, and an optional table tracking expired queue positions. Treat those names as descriptive of the data, not as guaranteed attribute names; the durable record of “who is where in line” is in DynamoDB and the fast-moving “where is the line now” is in Redis.

That two-store split is the most transferable lesson in the whole architecture. If you build a queue yourself, you will reach for the same shape: an atomic in-memory counter for the position math, a durable store for the audit trail and the token records. Designing a fair queue at scale digs into the ordering guarantees that decision implies.

It is worth being precise about why the counter cannot live in DynamoDB. A single queue counter is, by definition, a hot key: every entering visitor needs to increment the same item. DynamoDB partitions throughput by key, so a single item with millions of writes per minute is the textbook anti-pattern, the one the service’s own guidance warns against. You can mitigate it with write sharding, splitting the counter across N items and summing them, but then reading the true position means a scatter-gather across the shards and the position is only eventually consistent. Redis sidesteps the whole problem because a single-threaded in-memory INCR is atomic and fast, and the only durability you give up is the part you did not need for a counter that resets per event anyway. The trade is deliberate: you accept that the Redis cluster is a stateful component inside a VPC, with the operational weight that implies, in exchange for a counter that does not buckle. The cost of that choice shows up on your bill as an always-on ElastiCache cluster, which is one reason the lighter replacement AWS now recommends avoids a counter entirely.

The DynamoDB side carries the records you actually need to keep. When /num_active_tokens counts tokens whose exp is in the future, or /expired_tokens enumerates the request IDs that lapsed, those queries run against the durable token records, not against Redis. The serving-counter history in DynamoDB is what lets the system answer “what position was being served at time T,” which matters for reconciliation after an event. The two stores are not redundant copies of the same data; they hold different things with different access patterns, and the design assigns each job to the store that does it cheaply.

How the line moves

A queue position by itself does nothing. What lets you through is the serving counter catching up to your number. When the serving counter is greater than or equal to your assigned queue position, you are eligible to ask for a token. So the entire admission rate of the system reduces to one question: who increments the serving counter, and how fast?

The base mechanism is the private /increment_serving_counter endpoint. It takes an event ID and a signed increment_by, and it advances (or rewinds) the serving position. In the simplest deployment a human does this from the control panel, clicking to “allow 100 users to move on from the waiting room to the target site,” as the launch post puts it. Manual control is fine for a scheduled, supervised on-sale. It does not scale to an unattended event, so the solution ships two automated “inlet strategies” as samples you can adopt or replace.

The periodic strategy is the blunt one. A CloudWatch Events rule fires a Lambda every minute, and that Lambda bumps the serving counter by a fixed amount. It is parameterized with an event start and end time and the increment size, and it can optionally check a CloudWatch alarm before each bump, so you can wire it to back off when, say, your origin’s latency alarm goes red. This is open-loop control with a safety interlock. You decide in advance that the origin can take N new users per minute and you let people in at that rate regardless of what they do once admitted.

The MaxSize strategy is the closed-loop version, and it is more interesting. Here you tell the inlet Lambda the maximum number of concurrent users your site can hold. The strategy then advances the serving counter only as fast as people leave. It is driven by SNS notifications carrying counts of users who have exited, plus the specific request IDs to mark completed or abandoned. As sessions finish, capacity frees up, and the counter advances to admit exactly that many replacements. The site stays pinned near its concurrency ceiling instead of guessing at a safe per-minute rate. This is the same control idea a token-bucket admission system uses, expressed in AWS primitives.

Two ways to advance the serving counter Periodic (open loop) CloudWatch rule · every 1 min counter += fixed N rate set in advance; optional alarm gate MaxSize (closed loop) SNS · users exited / done counter += freed capacity tracks live concurrency; pins site near its ceiling *Periodic admission sets a rate in advance and lets people in regardless of downstream load; MaxSize admits exactly as fast as sessions complete, holding the protected site near a configured concurrency limit. They are open-loop and closed-loop control of the same counter.*

There is a subtlety the system handles that is easy to miss when you first build a queue. Some people who get a position never come back to claim a token; they close the tab. If the serving counter only advances when people complete, those abandoned positions would stall the line forever. The solution addresses this with queue-position expiry. A position can lapse after a configured time, exposed to the client through /queue_pos_expiry, which returns the remaining seconds and answers with HTTP 410 once the position has expired. There is an opt-in behavior to automatically advance the serving counter to account for positions that lapsed without ever generating a token, so abandonment does not permanently consume a slot. The general failure mode of “issued a slot, never reclaimed” is exactly the kind of leak that breaks fairness in naive queues, which why waiting rooms leak covers across several systems.

The token

Once your queue position has been served, you call /generate_token. The system mints a set of JSON Web Tokens, an access token, an ID token, and a refresh token, signed with RSA. The private key lives in Secrets Manager alongside the Redis password; the matching public key is served from the /public_key endpoint so anything downstream can verify a token without holding the secret. This is the standard asymmetric arrangement: only the waiting room can sign, anyone can check.

The interesting part is what is inside the access token, because the claims are not a generic OAuth payload. They encode the queue. The documented claim set is:

Access token claims RSA-signed; verifiable against /public_key queue_position your integer place in line — the whole point aud EVENT_ID — which on-sale this is for sub REQUEST_ID — your queue assignment iss ISSUER_URL — the waiting room API token_use access | id | refresh iat · nbf · exp issued / not-before / expiry timestamps The position rides inside the signed token, so the authorizer never has to ask the queue where you were. *The access token carries the queue position as a first-class claim alongside standard JWT registered claims. Because it is signed, the downstream authorizer trusts the position without a round trip back to the counter.*

The claim that makes this a waiting-room token rather than a generic session token is queue_position. Your numeric place in line travels inside the signed payload. The aud claim is the event ID, scoping the token to one on-sale. The sub is the request ID, your specific queue assignment. The token_use claim distinguishes access, ID, and refresh tokens. And the time claims, iat, nbf, and exp, bound its validity. The token is time-limited by design; the launch post describes the authorizer’s job as ensuring every protected invocation carries “a validated time-limited token issued by the waiting room core API.”

Carrying the position inside the token is the design move worth copying. The alternative, where the downstream gate calls back to the queue service on every request to ask “was this person served?”, couples your protected origin’s throughput to your queue service’s availability, which is backwards. The whole reason you deployed a waiting room is that your origin is fragile under load. Putting the proof in a signed, self-contained token means the origin verifies a signature locally and never depends on the queue being up. Cloudflare’s waiting room makes the same call with its own signed JWT, covered in Cloudflare Waiting Room internals, and the comparison is instructive: same cryptographic shape, different issuing infrastructure.

The authorizer, where the proof gets checked

The token only means something if something checks it. That something is the Lambda authorizer, shipped as the token-authorizer module and wired into API Gateway. Before a request reaches your protected API, the authorizer runs. It pulls the JWT from the request, verifies the RSA signature against the public key, and checks the claims, that the token has not expired, that the audience matches the event, that it was issued by the expected issuer. If the token is valid the authorizer returns an IAM policy permitting the call. If not, the request is denied before it ever touches your origin.

This is the enforcement boundary. Everything upstream, the queue number, the serving counter, the polling, is orchestration that decides whether you get a token. The authorizer is the one place where holding a token actually buys you something. Get the authorizer wrong and the rest of the queue is theater: if the protected endpoints can be reached without a valid token, an HTTP client can skip the line by requesting the origin directly, which is the most common way real-world waiting rooms leak. The solution gets the boundary placement right by putting the check in API Gateway’s own authorizer slot, so there is no code path to the protected resource that bypasses it.

There is a second-order benefit to verifying the signature locally rather than calling back to the queue. The token is bearer-style: whoever holds it can present it. That is the same property a session cookie has, and it carries the same caveat. A valid token, captured and replayed before it expires, is valid no matter who replays it. The system’s defenses against that are the ones any bearer-token design leans on, a short exp window and the aud scoping that pins a token to one event, plus the private API’s ability to mark a session completed or abandoned through /update_session. The architecture does not try to bind the token to a device fingerprint or a TLS session the way a dedicated anti-bot vendor would; that is simply not what this solution is for. It meters load. It is not a bot-defense product, and reading it as one would set the wrong expectations. If the queue sits in front of a high-value drop where token resale or automated reuse is the threat, the position proof here is the load-shedding layer, and you would want an actual detection layer in front of it.

The repository also ships a core-api-authorizers-sample module showing how to protect the core API’s own resources, and an openid-waitingroom adapter that re-expresses the whole flow in OIDC terms for systems that already speak OpenID Connect. The adapter maps the waiting room’s concepts onto the standard: the event ID becomes the OIDC audience and client ID, the request ID becomes the authorization code and subject, and queue_position rides along as a custom claim next to the standard ones. If you have an application that already does OIDC login behind an Application Load Balancer, the adapter lets the waiting room slot in as just another identity provider, which is a clean piece of design for retrofitting a queue onto an existing auth stack.

The EventBridge seam

One more piece makes this architecture worth studying as a template rather than a one-off: it is built to be extended without modification. The main template installs an EventBridge bus named STACK-WaitingRoomEventBus, and the solution emits custom events onto it as things happen. Three are documented. A token_generated event fires when a token is minted, carrying the event ID and request ID. A session_updated event fires when a session’s status changes, carrying event ID, request ID, and the new status. And an automatic_serving_counter_incr event fires when the counter moves, carrying the previous position, the increment, and the new position.

EventBridge as the extension seam WaitingRoomEventBus unidirectional notification token_generated event_id · request_id session_updated + status serving_counter_incr prev · by · current consumers: inlet strategies · analytics · your own automation — extend by subscribing, not by editing core *The core emits events; consumers subscribe. The sample MaxSize inlet strategy is itself just an EventBridge consumer, which is how you can swap admission logic without touching the queue core.*

The design intent is explicit: AWS describes the solution as extensible through two mechanisms, EventBridge for one-way event notification and the REST APIs for two-way interaction. That is a deliberately decoupled shape. You add behavior by subscribing to events, not by forking the core. Want custom analytics on how fast people abandon? Subscribe to session_updated. Want a bespoke admission policy? Subscribe to the counter and call the private API. The sample inlet strategies are themselves nothing more than event consumers plus calls back into /increment_serving_counter, which is why you can replace them wholesale. For a reference architecture, this is the most quietly important property: the example admission logic is a plug-in, not load-bearing structure.

What replaced it, and why the shape changed

When AWS deprecated the solution, it did not just walk away. It pointed people at a different pattern, and the contrast tells you something about where the cost lives. The recommended do-it-yourself replacement is described in a March 2023 networking blog by Gabin Lee, Akira Mori, and Yoshihisa Nakatani: visitor prioritization with CloudFront Functions.

That design is far smaller. A CloudFront Function runs at the viewer-request phase on every request and makes a probabilistic decision. It reads an originHitRate between 0 and 1, and for non-premium users it routes a fraction of them straight to the origin and the rest to a static waitingroom.html, using Math.random() >= originHitRate as the gate. Users who present a known cookie, in the example a cookie literally named premium-user-cookie with a configured secret value, skip the throttle entirely. The authors prefer CloudFront Functions over Lambda@Edge for this because they are cheaper and handle far higher concurrency, at the cost of running only at the viewer-request phase with no external network access.

The difference between the two is not subtle. The full solution gives you a true ordered queue: a real position, a number that ticks down, a fairness story, a signed proof that survives downstream. The CloudFront Functions pattern gives you a probabilistic admission door with no ordering at all; it is a coin flip weighted by originHitRate, not a line. For many “absorb a spike and shed load gracefully” cases that is genuinely enough, and it costs a rounding error to run. But it is a different product. If your event needs to tell a customer “you are number 4,182 and the line is moving,” random shedding will not do it, and you are back to building something with the structure this post just walked through. The choice between ordered fairness and cheap probabilistic shedding is the real fork, and AWS’s deprecation quietly pushed the default toward the cheap side.

What the open box teaches

The thing the AWS solution gives you that no commercial waiting room does is the ability to read the whole mechanism. Strip away the AWS-specific service names and the pattern that remains is small and durable. You buffer the entry spike so the act of getting a number cannot itself overload you. You keep the fast counter in memory and the durable record in a real database, because those two jobs have different cost curves. You separate the privileged act of advancing the line from the anonymous act of joining it, onto different APIs with different auth. You put the proof of your place inside a signed, self-contained token so the gate that checks it never has to phone home. And you make admission policy a plug-in, so the slow, supervised on-sale and the fully automated drop run the same core with a different consumer bolted on.

None of those ideas are AWS’s invention, and none of them are obsolete because one CloudFormation template got archived. Queue-it’s token-and-connector design, examined in Queue-it’s architecture, lands on the same primitives from a completely different starting point: a signed token, an enforcement point on the request path, a controlled drip into the origin. When two systems built by different companies for different customers converge on the same shape, that shape is the actual reference architecture. AWS just happened to publish theirs with the source attached, and even deprecated, an open box you can still read beats a closed one you have to guess at.


Sources & further reading

Further reading