How HTTP caching headers really work: Cache-Control, Vary, and revalidation

Almost every slow page you have ever debugged has a caching story underneath it, and almost every caching bug comes down to the same root cause: someone assumed a header meant one thing when the specification says another. no-cache does not mean “do not cache.” Vary: * does not mean “vary on everything.” An ETag does not guarantee a 304. A response with max-age=600 can sit in a shared cache for an hour if an intermediary along the way also added an Age of its own. The rules are written down, they are precise, and they are routinely misread.

This is a reference for the rules. Not the marketing version where caching is a checkbox you turn on, but the actual state machine that a browser, a CDN, and a reverse proxy each run on every response. The authority is RFC 9111, published June 2022, which replaced RFC 7234 and is now the single document that defines HTTP caching for HTTP/1.1, HTTP/2, and HTTP/3 alike. Where 9111 leaves something to vendor discretion, that gets called out, because the gap between “what the RFC permits” and “what Cloudflare actually does” is where most of the surprises live.

The sections below walk the lifecycle of a cached response in roughly the order a cache processes it. First, what storability even means, and the private-versus-shared split that governs it. Then freshness: how max-age, s-maxage, and Expires set a lifetime, and how Age and the current-age formula erode it. Then revalidation with ETag and Last-Modified, the 304 handshake, and the conditional-request headers that drive it. Then Vary, the single most misused header in the protocol, and the cache-key problems it creates. Then the stale-serving extensions from RFC 5861, immutable, and finally the security edge where cache keys and unkeyed inputs collide.

What a cache may store, and the private/shared split

The first decision a cache makes is whether it is allowed to keep a response at all. RFC 9111 lists the conditions under section 3, and they are permissive by design: a cache MAY store a response as long as nothing forbids it. The response method has to be cacheable (GET is, POST generally is not), the status code has to be one the cache understands as cacheable or the response has to carry an explicit directive that makes it so, and crucially the response must not carry no-store.

That last one is the hard stop. Cache-Control: no-store tells every cache on the path, private or shared, that it MUST NOT store any part of the request or the response. Nothing on disk, nothing in memory that survives the transaction. This is the directive for a bank statement or a password-reset page. It is also the directive people reach for when they mean something gentler, which is the start of a long line of confusion.

The split that governs almost everything downstream is private versus shared. A private cache belongs to one user: the browser’s own HTTP cache, the cache inside a single user’s client. A shared cache sits in front of many users: a CDN edge node, a reverse proxy like Varnish or nginx, a forward proxy at a corporate gateway. The distinction matters because the two have different threat models. A private cache can safely hold a personalized page because only its owner will ever read it back. A shared cache holding the same personalized page is a data leak waiting to happen, because the next user through that edge node gets served the previous user’s account dashboard.

Cache-Control: private is how a response says “private caches only.” Per RFC 9111, the unqualified form means a shared cache MUST NOT store the response, because it is intended for a single user. The browser keeps it; the CDN drops it. There is also a qualified form, private="Set-Cookie", where only the named header fields are restricted to a single user and the rest of the response may still be shared. That qualified form is rare in the wild but it is in the spec, and it exists precisely so an origin can mark one sensitive header without making the whole response unshareable.

*Where a response is allowed to live depends on the directive. `private` keeps it out of the shared edge; `s-maxage` targets the shared edge specifically.*

public is the inverse signal, and it is more subtle than it looks. It does not mean “please cache this.” It means a cache may store the response even in cases where the heuristics would otherwise forbid it, for example a response to a request that carried an Authorization header. Most responses do not need public at all, because the default storability rules already permit caching. Reaching for public to “enable caching” is usually a sign the real problem is somewhere else, like a Set-Cookie or an Authorization header that is suppressing storage.

Freshness: max-age, s-maxage, Expires, and the age you did not account for

Once a response is stored, the cache needs to know how long it can serve that copy without checking back. This is freshness, and the core test in RFC 9111 is one line:

1
response_is_fresh = (freshness_lifetime > current_age)

Two quantities. The freshness lifetime is how long the response is allowed to be considered fresh from the moment it left the origin. The current age is how long it has actually been since then. As long as the lifetime exceeds the age, the cache serves the stored copy with no network round trip. The instant age catches up to lifetime, the response is stale and the cache has to do something about it.

The lifetime comes from a strict precedence, first match wins. If the cache is shared and the response carries s-maxage, that value wins. Otherwise, if max-age is present, that wins. Otherwise, if there is an Expires header, the lifetime is Expires minus the Date header. Otherwise there is no explicit expiration and the cache may fall back to a heuristic, which we will get to. The ordering is why s-maxage is the right tool for tuning CDN behavior without touching browser behavior: it is invisible to private caches, which skip it entirely and fall through to max-age.

Expires is the HTTP/1.0 mechanism and it survives mostly for compatibility. It carries an absolute timestamp, Expires: Thu, 01 Dec 1994 16:00:00 GMT. RFC 9111 is explicit that if max-age is present a recipient MUST ignore Expires, and if s-maxage is present a shared cache MUST ignore it too. There is one detail worth keeping: a cache MUST interpret an invalid date, and the value 0 specifically, as a time already in the past. So Expires: 0 is a valid, if blunt, way to say “already stale.” It is also a frequent source of confusion when someone sets Expires: 0 and a max-age in the same response and wonders why the max-age wins. The RFC told it to.

The age side of the equation is where most people stop reading, and it is exactly where the surprises hide. Age is not just “now minus when my cache stored it.” A response may have already aged in some upstream cache before it ever reached you. The Age header carries that accumulated time: it is the sender’s estimate of seconds since the response was generated or last validated at the origin. RFC 9111 gives the full reconstruction:

1
apparent_age          = max(0, response_time - date_value)
2
response_delay         = response_time - request_time
3
corrected_age_value    = age_value + response_delay
4
corrected_initial_age  = max(apparent_age, corrected_age_value)
5
resident_time          = now - response_time
6
current_age            = corrected_initial_age + resident_time

Read it slowly. The cache takes the larger of two estimates of how old the response already was on arrival, the clock-difference estimate (apparent_age) and the Age-header estimate corrected for network delay, then adds the time the response has been sitting locally. The max() is a defense against under-counting age when clocks disagree or when an upstream Age was suspiciously low. The practical consequence: a response with max-age=60 that arrives already carrying Age: 55 is fresh for five more seconds in your cache, not sixty. Multi-tier cache hierarchies, of the kind a CDN runs internally between edge and shield, depend on this arithmetic being correct, and they advertise it back to clients by rewriting Age on the way out. When a cache serves a stored response without validating, it MUST emit an Age header equal to the stored response’s current age. That is how a browser, two hops down, can still compute a faithful lifetime.

*The `Age` header is the part of the freshness equation that bites. A long `max-age` does not help if the response already aged upstream.*

When no explicit lifetime is present, a cache MAY guess. This is heuristic freshness, and RFC 9111 allows it only when there is no explicit expiration. The canonical heuristic uses Last-Modified: take some fraction of the interval since the resource last changed, and the spec names 10 percent as a typical setting. A document last modified ten days ago, served with no Cache-Control and no Expires, might be treated as fresh for a day. This is reasonable for static files and a menace for anything dynamic, which is why dynamic endpoints should always send an explicit lifetime even if it is max-age=0. Leaving it to the heuristic means handing a CDN permission to invent a lifetime you never sanctioned.

Revalidation: ETag, Last-Modified, and the 304 handshake

Freshness eventually runs out. When it does, a cache does not have to throw the response away and fetch the whole thing again. It can ask the origin a cheaper question: has this changed? If the answer is no, the origin replies 304 Not Modified with no body, and the cache reuses what it already has, resetting the freshness clock. This is revalidation, and it is the difference between re-downloading a 2 MB image and confirming in a few hundred bytes that the copy you hold is still good.

Revalidation needs a validator, a token that identifies a specific version of the resource. HTTP has two. Last-Modified is a timestamp of when the resource last changed. ETag, defined in RFC 7232, is an opaque string the origin assigns to a representation, often a hash or a version counter, with no meaning to anyone but the origin. The two travel on responses; the client echoes them back on the next conditional request to ask its question.

The conditional headers are the mirror image of the validators. If-None-Match carries one or more ETags the client already holds: “send me the body only if none of these still match.” If-Modified-Since carries a date: “send me the body only if it changed after this time.” A cache holding a stale response with an ETag SHOULD issue a GET with If-None-Match; the origin compares, and if the current ETag is in the list it returns 304. When both If-None-Match and If-Modified-Since are present, the ETag wins. RFC 7232 says a recipient MUST ignore If-Modified-Since if If-None-Match is also present, because the ETag is the more precise validator. Date comparison has a one-second resolution and is vulnerable to clock skew between origin servers in a fleet; an ETag tied to exact content has neither weakness.

*The `304` handshake: the body never moves. The validator confirms the cached copy is current, and the freshness window starts over.*

ETags come in two strengths. A strong ETag promises byte-for-byte identity: same ETag, same bytes. A weak ETag, written W/"...", promises only semantic equivalence, that the representations are good enough to swap even if a byte or two differs. The distinction matters for range requests and for the If-Match header used in update operations, where a strong validator is required because you cannot safely patch a resource you only semantically recognize. For plain GET revalidation, a weak ETag is fine and often cheaper to generate, because the origin does not have to guarantee exact-byte stability across, say, a gzip-level change.

There is a recurring operational failure with ETags behind a load balancer. If every backend computes the ETag differently, an inode number, a per-process hash seed, a timestamp with sub-second jitter, then the same file served from two servers gets two different ETags, and the client’s If-None-Match never matches whichever server it lands on next. The result is a revalidation that always returns 200 with a full body, defeating the entire mechanism. Apache historically generated ETags from inode, size, and mtime, which broke across a server farm until operators learned to set FileETag MTime Size or drop the inode component. The fix is to make the validator a deterministic function of content, not of which machine answered. The same discipline carries straight into incremental recrawl for a crawler, where stable ETags are what let a fetcher skip unchanged pages instead of re-downloading the web every cycle.

no-cache lives in this section, because it is fundamentally a revalidation directive and not a storage directive. The single most common misreading in all of HTTP caching is that no-cache means “do not store this.” It does not. MDN states it plainly: no-cache allows a cache to store the response but requires it to revalidate before every reuse. The response goes on disk; it just cannot be served without a fresh 304/200 check first. If you actually want “never store,” the directive is no-store. The two get swapped constantly, and the consequences differ wildly: no-cache still gives you a fast 304 path and offline reuse semantics, while no-store forces a full download every single time. must-revalidate is the stricter cousin: it permits serving fresh responses from cache without checking, but once a response goes stale it MUST be validated before reuse, closing the door on the stale-serving allowances we are about to discuss.

Vary: the cache key, content negotiation, and how it goes wrong

Everything so far assumed one URL maps to one stored response. Vary breaks that assumption. It tells the cache that the response depends not only on the URL but on the value of one or more request headers, so the cache must key its stored entry on those header values too. Vary: Accept-Encoding means “the gzipped version and the identity version are different responses for the same URL; store them separately and serve each to the request that asked for it.” Without Vary, a cache that stored the gzip variant would happily hand it to a client that cannot decompress, and the page would arrive as garbage.

The matching rule in RFC 9111 section 4.1 is exact and worth stating precisely. A cache MUST NOT reuse a stored response that carries Vary unless every request header nominated by the Vary value matches between the new request and the original. Matching is not byte-equality; the cache may normalize values in ways the header’s own syntax declares semantically identical, like collapsing whitespace, combining repeated field lines, and reordering where order does not matter. And a header absent from one request can only match a request where it is also absent. So a request with no Accept-Encoding is a distinct cache variant from one that sends Accept-Encoding: gzip.

Then there is Vary: *. The spec is blunt: a stored response whose Vary value contains * always fails to match, on every subsequent request, forever. That makes the response effectively uncacheable for reuse by shared caches. People reach for Vary: * thinking it means “this varies on something I cannot enumerate.” What it actually means is “never reuse this.” Fastly’s guidance on the header is two words for this case: don’t use it. If the intent is “do not let a shared cache reuse this,” the correct, readable directive is Cache-Control: private, which says so directly and does not leave the next engineer guessing.

The real damage Vary does is to hit rates, through cache fragmentation. Every distinct value of a varied header produces a separate stored entry. Vary on a header with thousands of distinct values and you get thousands of near-identical cache entries, each one cold, each one a fresh trip to origin. The two classic offenders are User-Agent and Cookie. Fastly measured roughly 8,000 distinct User-Agent strings in a 100,000-request sample; varying on it shatters the cache into thousands of fragments and drives origin load up linearly with the number of variants. Cookie is worse, because cookies are nearly unique per user, so Vary: Cookie is close to “cache nothing” while looking like a caching configuration.

*Varying on a high-cardinality header fragments the cache into near-uniqueness. The fix is to normalize the header to a small set of buckets before it reaches the cache key.*

The way CDNs survive Vary is normalization. Before the header ever touches the cache key, the edge rewrites it to a small set of canonical values. Fastly collapses Accept-Encoding to either gzip or nothing, turning dozens of browser-emitted variants into two cache buckets. Cloudflare classifies User-Agent into a handful of device classes rather than keying on the raw string. The general technique is to map a high-cardinality header down to the few buckets that actually change the response, do that mapping at the edge, and only then vary. This is the same fan-in problem that consistent hashing solves for request distribution: you want many inputs to land on few stable keys. Get the normalization wrong and you either fragment the cache or serve the wrong variant, and the second failure mode is a correctness bug, not just a performance one.

One sharp edge that the RFC itself flags: when content negotiation is in play, the Accept family of headers carry quality weights (Accept-Language: en-US,en;q=0.9,fr;q=0.8), and two requests with differently ordered or differently weighted Accept values may be semantically compatible yet not byte-identical. The spec permits normalization but does not mandate a specific algorithm, so whether a cache treats those as the same variant is implementation-defined. This is an active discussion in the HTTP working group, and it is why varying on Accept-Language in practice means picking your own normalization (down to a supported language set) rather than trusting raw header equality.

Serving stale on purpose: stale-while-revalidate and stale-if-error

A strict cache faces a hard choice when a response goes stale: block the user while it revalidates, or risk serving something out of date. RFC 5861, authored by Mark Nottingham and published May 2010, adds two Cache-Control extensions that turn that binary into something smoother. Both are now listed as widely available across browsers and CDNs.

stale-while-revalidate=<seconds> lets a cache serve the stale response immediately, for up to the named number of seconds past expiry, while it revalidates in the background. The user gets an instant response from cache; the cache quietly fetches a fresh copy for the next request. A response sent as Cache-Control: max-age=600, stale-while-revalidate=30 is fresh for ten minutes, and for thirty seconds after that the cache keeps serving the old copy while a background fetch refreshes it. The visible latency of revalidation drops to zero for everyone except the unlucky request that triggers the background refresh, and even that one is served from stale cache. Browsers implement this for navigations and subresources, and every major CDN exposes it.

The companion is stale-if-error=<seconds>. It says that if revalidation hits an error, a 5xx from origin, a connection failure, a DNS failure, the cache MAY serve the stale copy anyway rather than propagating the error to the user. This is an availability mechanism. When the origin falls over, an edge holding stale content with stale-if-error keeps serving the last known-good response for the configured window instead of returning a wall of 502s. It buys time to recover without the users noticing, as long as stale-but-working beats fresh-but-broken, which for most content it does.

RFC 5861 raises one security point worth keeping in mind. The background revalidation that stale-while-revalidate triggers should be predicated on an actual incoming request, not on an automatic timer, because a cache that revalidates on a schedule independent of demand can be turned into an amplification vector: a small trickle of triggering requests fanning out into a flood of origin fetches. Tying revalidation to real requests keeps origin load proportional to real traffic. This is the same proportionality concern that governs sane rate limiting and backoff on the client side, viewed from the cache’s end of the pipe.

immutable: the directive that fixed the reload problem

There is a category of resource that genuinely never changes: a fingerprinted asset like app.4f9a2c.js, where the content hash is in the filename, so a new version gets a new URL and the old URL’s bytes are frozen for all time. For these, even revalidation is wasted work, and browsers used to do it anyway. When a user pressed reload, browsers historically revalidated even fresh resources, firing a conditional request for every asset to confirm it was still current, despite a max-age that said it would be fresh for a year.

Cache-Control: immutable exists to stop that. It tells the browser that while the response is fresh it will never change, so a reload should not bother revalidating it. Firefox shipped it in version 49, and Patrick McManus’s January 2017 write-up at Mozilla put numbers on the win: a Facebook feed reload dropped from 150 resources to 25 network requests, and a BBC trial saw reload times improve by up to 50 percent with around 90 percent of requests optimized away. Facebook was among the first adopters precisely because their users reload constantly and the revalidation traffic was real cost. The directive draws a line the older model blurred, between a response being fresh, meaning playable from cache, and being current, meaning the latest version. For a content-hashed asset those are the same thing, and immutable lets the browser act like it.

The practical recipe pairs immutable with a long max-age on hashed assets only: Cache-Control: public, max-age=31536000, immutable. Never put it on a resource that can change at a stable URL, like index.html or an un-fingerprinted stylesheet, because then a reload genuinely needs to check for updates and immutable tells it not to. The directive is safe exactly to the degree that the URL is a true content address. That property, a stable URL meaning stable bytes, is also what makes a CDN’s cache hierarchy cheap to operate: immutable assets propagate through edge and shield once and never need coherence traffic again.

Where caching meets security: keys, unkeyed inputs, and Vary

Caching is a correctness problem before it is a performance one, and the place it goes wrong most dangerously is the cache key. A cache decides whether two requests are “the same” by hashing a chosen subset of the request: typically the method, the host, and the path, plus whatever headers Vary nominates. Everything in that subset is keyed. Everything else is unkeyed. The unkeyed parts still reach the origin and can still change the response, but the cache ignores them when deciding what to store and serve. That gap is the attack surface.

Web cache poisoning, the class of attack James Kettle’s PortSwigger research mapped out in detail, lives in exactly that gap. If an unkeyed request header influences the response, say an X-Forwarded-Host that the application reflects into an absolute URL, an attacker can send a request with a malicious value, the origin bakes that value into the response, and the cache stores it under the normal, clean cache key. Every subsequent user who requests that key gets served the poisoned response. The cache faithfully does its job; the bug is that the response varied on an input the cache was not keying on. Vary is the protocol’s intended fix, by nominating the influential header for keying, but in practice it is used sparingly and some CDNs do not honor arbitrary Vary values at all, so defense leans on stripping or normalizing dangerous headers at the edge instead. The X-Forwarded-For and forwarding-header chain is a frequent source of exactly these unkeyed-but-influential inputs.

The mirror-image attack is web cache deception, and it does not need any header trickery. It exploits the cache’s path-and-extension rules. If a cache is configured to store anything ending in .css or .js regardless of the response’s own caching headers, and the application routes /account/profile.css to the same handler that serves /account/profile (ignoring the made-up extension), then an attacker tricks a victim into loading /account/profile.css. The application returns the victim’s private profile; the cache, seeing a .css suffix, stores it as a static asset; the attacker then requests the same URL and reads the victim’s data straight out of the cache. The fix is to make the cache respect the response’s actual Cache-Control rather than guessing from the path, and to never let a static-extension rule override a private or no-store on a dynamic response. This is also why a CDN’s cache key design is a security boundary, not just a performance tuning knob.

The defensive posture that falls out of all this is consistent. Personalized or authenticated responses get Cache-Control: private or no-store, explicitly, never left to a heuristic or a path rule. Any request header that legitimately changes a response gets either keyed via Vary or stripped at the edge so it cannot. And the cache’s notion of “same request” gets audited against the application’s notion of “same response,” because every divergence between those two is a poisoning or deception bug waiting for someone to find it.

Closing: the spec is the contract, and almost nobody reads it

The throughline across every section here is that HTTP caching is a contract written in headers, and the failures come from one side or the other not reading the contract the same way. The origin thinks Cache-Control: no-cache means “don’t store this,” the CDN reads it as “store but always revalidate,” and the bug report says “why is my private page in the edge cache.” The origin sets Vary: User-Agent to serve a mobile variant and watches its hit rate collapse to nothing. Someone sets a long max-age and cannot understand why a response goes stale in seconds, because they never accounted for the Age it arrived with. None of these are exotic. They are the same handful of misreadings, repeated across teams that all assumed the headers were simpler than they are.

RFC 9111 made the contract better by consolidating it. One document now covers what RFC 7234 and a scattering of errata used to, the formulas are spelled out, the precedence rules are explicit, and the stale-serving extensions from RFC 5861 plug in cleanly. The browser support caught up: immutable, stale-while-revalidate, and stale-if-error are all baseline-available now, which was not true a few years ago. What has not changed is that the hard part is operational, not specificational. The spec tells you exactly what s-maxage does; it cannot tell you that your CDN normalizes Accept-Encoding to two buckets while your origin emits a Vary that assumes raw values. That mismatch is yours to find.

If you take one habit from this, make it this: before trusting a caching behavior, look at the actual response headers on the wire and the actual cache key your CDN computes, not the configuration you think you wrote. The Age header on a response tells you how long it really sat upstream. The Cache-Control on a 304 tells you what the cache will do next. The presence or absence of Vary tells you whether the variant you are looking at is the one the next user gets. The headers do not lie, and they are the only part of this system that is actually under contract.

Sources & further reading

IETF (2022), RFC 9111: HTTP Caching — the current normative specification for HTTP caching; obsoletes RFC 7234, defines Cache-Control, freshness, Age, and Vary matching.
IETF (2020), RFC 7232: HTTP/1.1 Conditional Requests — defines ETag, Last-Modified, If-None-Match, If-Modified-Since, and the 304 response semantics.
Mark Nottingham / IETF (2010), RFC 5861: HTTP Cache-Control Extensions for Stale Content — defines stale-while-revalidate and stale-if-error, including the amplification-attack security note.
MDN Web Docs (2026), Cache-Control header reference — per-directive support status and the explicit no-cache-is-not-no-store clarification.
Patrick McManus / Mozilla Hacks (2017), Using Immutable Caching To Speed Up The Web — the immutable directive, the reload-revalidation problem, and the Facebook and BBC measurements.
Fastly (2017), Best practices for using the Vary header — why varying on User-Agent and Cookie destroys hit rates, and how edge normalization fixes it.
Fastly Documentation (2024), Lifetime and revalidation — how a production CDN implements stale-while-revalidate and stale-if-error.
James Kettle / PortSwigger Research (2018), Practical Web Cache Poisoning — the foundational write-up on keyed vs unkeyed inputs and cache poisoning via reflected headers.
PortSwigger Web Security Academy, Web cache poisoning — structured reference on cache keys, unkeyed inputs, and the relationship to Vary.
Dan Cătălin Burzo (2023), HTTP caching, a refresher — a careful, RFC-grounded walkthrough of the freshness and revalidation model.