Tagged: infrastructure

DataDome's server-side scoring pipeline: from edge to decision in milliseconds

Traces how DataDome turns an HTTP request into an allow, challenge, or block verdict at the edge: the module-to-API split, the form fields it ships, the regional inference layer, and the latency budget that keeps it synchronous.

datadome anti-bot bot-detection infrastructure

Sun, June 7, 2026 · 22 min read

F5 Distributed Cloud Bot Defense: the architecture after the Shape acquisition

Traces how Shape Security's bot-detection stack became F5 Distributed Cloud Bot Defense: the client-side JavaScript and mobile SDK, the connector model, the telemetry path to the inference engines, and where the system sits in 2026.

f5-shape anti-bot bot-detection infrastructure

Wed, May 13, 2026 · 19 min read

How virtual waiting rooms work: token buckets, queue position, and fair ordering

A vendor-neutral reference on virtual waiting rooms: the admission model behind the token bucket, FIFO versus random ordering, the cookie that holds your place, and the split between inbound and active users.

waiting-room anti-bot infrastructure

Thu, April 30, 2026 · 22 min read

Cloudflare Waiting Room internals: the JWT, the estimated wait, and edge coordination

Traces how Cloudflare Waiting Room queues traffic from the edge: the encrypted __cfwaitingroom cookie, the total-active-users and new-users-per-minute limits, the estimated-wait math, and the Durable Object hierarchy that counts users across 300-plus data centers.

cloudflare waiting-room infrastructure

Tue, April 28, 2026 · 18 min read

Akamai's queueing and rate control: waiting rooms at the CDN edge

Traces how Akamai runs visitor queueing at the CDN edge, from the percentage-based Visitor Prioritization cloudlet to EdgeWorkers connectors that validate queue tokens locally, and how that compares to a dedicated queue vendor.

akamai waiting-room infrastructure

Mon, April 27, 2026 · 23 min read

The words FAIR QUEUE in monospace with an orange admission arrow cutting through a row of waiting dots

Designing a fair queue at scale: lessons from high-demand ticket on-sales

Traces the distributed-systems problem behind a virtual waiting room: admission control under a thundering herd, the fairness-versus-throughput tradeoff, clock skew in queue ordering, the signed token design, and the failure modes that leak slots.

waiting-room infrastructure distributed-systems

Sat, April 25, 2026 · 24 min read

AWS Virtual Waiting Room: the serverless reference architecture deconstructed

A read-through of the Virtual Waiting Room on AWS solution: the public and private REST APIs, the SQS-buffered queue assignment, the Redis serving counter, and the RSA-signed JWT that proves you cleared the line.

waiting-room infrastructure aws

Fri, April 24, 2026 · 23 min read

Detecting virtualized and containerized browsers: GPU, screen, and timing artifacts

How detectors spot a browser running in a VM or container: software WebGL renderers like SwiftShader and llvmpipe, default 800x600 screens, quantized device memory, and timing artifacts under virtualization.

browser-automation anti-bot fingerprinting infrastructure

Wed, April 1, 2026 · 23 min read

Designing a distributed crawler: frontier, dedup, politeness, and backpressure

Traces the architecture of a web-scale crawler from Mercator and the early Googlebot through IRLbot to today: the URL frontier, duplicate elimination, politeness scheduling, and how servers push back.

crawling distributed-systems infrastructure

Sun, March 29, 2026 · 21 min read

URL frontier design: from Mercator to modern priority-queue crawlers

How the URL frontier orders a crawl: the Mercator front-queue/back-queue split, per-host politeness, freshness versus coverage, and the disk-backed and gRPC designs that run at web scale today.

crawling distributed-systems infrastructure

Sat, March 28, 2026 · 22 min read

Proxy pool management: rotation, health checks, and burn-rate economics

Traces how a working proxy pool is operated: rotation strategies, the difference between a banned IP and a dead one, health-check state machines, sticky versus rotating sessions, and the per-GB cost model that decides whether a crawl is profitable.

proxies crawling infrastructure

Wed, March 25, 2026 · 22 min read

A token bucket diagram with an orange refill drip and a 429 backoff curve

Rate limiting yourself: token buckets, adaptive throttling, and 429 backoff

Traces client-side rate control for crawlers: token and leaky buckets applied to your own requests, per-host concurrency, adaptive throttling on 429 and Retry-After, and exponential backoff with jitter.

crawling rate-limiting infrastructure

Fri, March 20, 2026 · 23 min read

Parsing at scale: when to use a real browser vs an HTTP client

A decision framework for choosing between a headless browser and a plain HTTP client at extraction scale: JS-dependence, per-page cost, fingerprint surface, brittleness, and the hybrid path most large crawlers actually take.

crawling browser-automation infrastructure

Tue, March 17, 2026 · 18 min read

The headless-browser tax: memory, CPU, and why HTTP clients win when they can

Traces the real resource cost of driving headless Chrome at scale: per-instance RAM, the multi-process tax, container failure modes, concurrency math, and the cost gap that pushes teams back to HTTP clients.

crawling browser-automation infrastructure

Mon, March 16, 2026 · 22 min read

Scraping observability: success metrics, block-rate dashboards, and silent failures

Traces how to instrument a scraping system end to end: the metrics that matter, why HTTP 200 is a lie, how to detect soft blocks and empty-payload garbage, and how to build dashboards and alerts that catch silent failure before the data does.

crawling infrastructure observability

Sat, March 14, 2026 · 26 min read

ClientHello label over a dark background, with an orange capture arrow into a fingerprint hash

Server-side TLS fingerprinting libraries: how the edge captures the handshake

Where TLS fingerprints are actually computed in a server stack: the OpenSSL and BoringSSL callbacks that hand you the raw ClientHello, the nginx, HAProxy, and Envoy modules built on them, and the constraints that decide whether you get the bytes at all.

tls fingerprinting infrastructure

Sat, February 28, 2026 · 21 min read

How a CDN actually works: anycast, POPs, and the cache hierarchy

Traces what a CDN really does on a request: how anycast and BGP pick a point of presence, how the edge/shield/origin cache tiers fit together, how cache keys decide what is a hit, and where TLS terminates.

cdn infrastructure networking

Sun, January 11, 2026 · 22 min read

Anycast routing: how one IP serves the whole planet

Traces how the same IP prefix advertised from hundreds of locations lets BGP route every user to a nearby instance, how DNS roots and CDNs use it, how failover works, and where TCP state breaks the model.

cdn networking infrastructure

Sat, January 10, 2026 · 21 min read

DNS resolution end to end: from stub resolver to authoritative answer

Traces a single DNS lookup from the stub resolver in your OS through the recursive resolver, root, TLD and authoritative servers, then explains caching, TTLs, negative answers, and the record types that make it work.

dns networking infrastructure

Fri, January 9, 2026 · 23 min read

How DNS load balancing and GeoDNS steer traffic

A reference on steering traffic through DNS answers: round-robin, weighted, latency and geo-based responses, health checks, EDNS Client Subnet, and the TTL and caching limits that make DNS an approximate load balancer.

dns cdn infrastructure

Wed, January 7, 2026 · 22 min read

BGP explained: how the internet's routing table actually converges

Traces how BGP carries reachability between autonomous systems: prefixes, AS_PATH, eBGP versus iBGP, the route-selection algorithm, and why convergence after a failure can take seconds to minutes.

bgp networking infrastructure

Mon, January 5, 2026 · 22 min read

Edge compute compared: Cloudflare Workers, Lambda@Edge, and Fastly Compute

A primary-source reference tracing how Cloudflare Workers, AWS Lambda@Edge and CloudFront Functions, and Fastly Compute isolate tenants, what their cold-start numbers actually mean, and which workloads each runtime can run.

edge-compute cdn infrastructure

Fri, January 2, 2026 · 21 min read

Diagram of a client TLS handshake ending at an edge proxy, with a second handshake to origin

The TLS terminating proxy: where your handshake really ends

Traces what happens when a CDN or load balancer terminates TLS at the edge: which certificate the client validates, what fingerprint the origin actually sees, how traffic is re-encrypted to origin, and who you are trusting with the cleartext.

tls infrastructure networking

Thu, January 1, 2026 · 22 min read

Load balancing algorithms: round-robin, least-connections, and consistent hashing

A reference on the core load-balancing algorithms: round-robin and weighted variants, least-connections, least-response-time, power-of-two-choices, and IP/consistent hashing, with the math and production tradeoffs of each.

infrastructure networking load-balancing

Wed, December 31, 2025 · 20 min read