Traces the distributed-systems problem behind a virtual waiting room: admission control under a thundering herd, the fairness-versus-throughput tradeoff, clock skew in queue ordering, the signed token design, and the failure modes that leak slots.
Traces the architecture of a web-scale crawler from Mercator and the early Googlebot through IRLbot to today: the URL frontier, duplicate elimination, politeness scheduling, and how servers push back.
How the URL frontier orders a crawl: the Mercator front-queue/back-queue split, per-host politeness, freshness versus coverage, and the disk-backed and gRPC designs that run at web scale today.
A primary-source walk through the URL-seen problem in large crawlers: why naive dedup fails at scale, how Bloom filters answer it, the false-positive math, and the counting, scalable, blocked, and cuckoo variants that followed.
Traces consistent hashing from Karger's 1997 ring to virtual nodes, jump hash, Maglev tables, and the bounded-load variant that Vimeo shipped in HAProxy, with the minimal-remapping math that ties them together.