How crawlers avoid re-fetching unchanged pages: conditional requests with ETag and Last-Modified, 304 handling, content hashing for change detection, and recrawl scheduling driven by per-page change rate.
Traces what a CDN actually puts in its cache key, how unkeyed headers and parser discrepancies turn a shared cache into an exploit delivery system, and the defenses that hold up against poisoning and deception.
A primary-source reference for HTTP caching: how Cache-Control directives, Expires, ETag and Last-Modified revalidation, Vary, and the stale-* extensions actually behave in private and shared caches under RFC 9111.
Traces web cache deception from Omer Gil's 2017 PayPal disclosure through the 2020 and 2022 measurement studies to the 2024 delimiter research, and the defenses that actually close the cache-versus-origin gap.