Skip to content

How CDNs absorb volumetric DDoS: scrubbing centers and anycast dispersion

· 21 min read
Copyright: MIT
The phrase ABSORB THE FLOOD in monospace with an orange arrow splitting one incoming stream into many edge sites

In May 2025 a single UDP flood delivered 37.4 terabytes of garbage in 45 seconds, peaking at 7.3 terabits per second, aimed at one hosting provider. Five months later a botnet of infected Android TVs pushed a 31.4 Tbps attack that lasted all of 35 seconds. Neither attack took its target down. Neither one triggered a human page. The traffic was dropped, in pieces, by hundreds of separate machines on five continents that each saw only a sliver of the whole, and most of those machines never knew they were part of a record.

That is the thing worth understanding about modern volumetric defense. A 31 Tbps attack is far larger than any single data center can ingest, let alone filter. No router on earth forwards that. The trick is that it never has to. The attack arrives already split into hundreds of streams, because the network it hit advertises one address from hundreds of places, and the internet’s own routing handed each piece of the botnet to whichever location was topologically nearest. The defender’s job stops being “filter a terabit firehose” and becomes “drop a few tens of gigabits, in parallel, in a few hundred rooms.” This post is about how that split happens, what fills the gap when it cannot, and the unglamorous economics that decide whether a network can afford to soak an attack at all.

The sections below start with the two architectures that volumetric defense splits into, then anycast catchment and why dispersion is the whole game, then the scrubbing-center model and how BGP diverts traffic into it, then the coarse network-layer tools (RTBH and flowspec) that drop traffic upstream, then the autonomous edge pipeline that turns a sample into a dropped packet, and finally the capacity math that decides who can play. If you have not read how a CDN actually works or anycast routing, they set up the routing model this post leans on.

Two shapes of volumetric defense

There are two ways to stand in front of a flood, and they differ in where the filtering happens.

The older model is the scrubbing center. A handful of large filtering sites, fed by big transit pipes, sit out of the normal traffic path. When an attack starts, you divert your traffic to them, they strip the bad packets, and they hand the clean remainder back to your origin over a tunnel. The filtering is centralized. Traffic is hauled to the scrubber and hauled back. This is how the first generation of DDoS providers worked, and it is still how a lot of network-layer protection works today, especially for organizations that keep their servers on their own IP space and only want help during an incident.

The newer model is the distributed edge. Every server in every location runs the full detection-and-mitigation stack, and there is no separate place that traffic gets sent to be cleaned. Cloudflare states this plainly: there are no out-of-path scrubbing centers or scrubbing devices in its design, and each server runs the detection and mitigation component itself. The filtering happens wherever the packet lands, which (because of anycast) is wherever it landed nearest. Akamai, Cloudflare, and Fastly all run variations of this for their proxied web traffic.

The two models are not mutually exclusive, and the same vendor often sells both. The split that matters is whether the customer’s traffic terminates on the provider’s anycast address (proxied web, where the edge model applies cleanly) or stays on the customer’s own IP space (network protection, where you need BGP diversion to pull traffic into the defense at all). The rest of this post treats them as two halves of the same problem: get the attack split up, then drop the pieces.

Scrubbing center (out of path) Distributed edge (in path) attack scrubber one site origin GRE edge POP edge POP edge POP drop in place, no backhaul *Left: traffic is diverted to one scrubbing site, cleaned, and returned to the origin over a tunnel. Right: each anycast POP filters the slice it received and forwards only clean traffic onward.*

Anycast catchment is the whole game

Volumetric defense at the edge works because of one property: the attack is pre-divided by the routing system before anyone has to filter it.

Anycast means the same IP prefix is announced into BGP from many locations at once. A user’s packets reach whichever announcing location their network’s routing rates best. The set of clients that land on a given location is that location’s catchment. Catchment is decided by BGP path selection, not by geography or latency, so it is lumpy and a little arbitrary. A measurement study of the B-Root nameserver found that adding a second site split load 82.6 percent to one and 17.4 percent to the other rather than anything close to even. That lumpiness is a planning headache for steady-state traffic. For DDoS it is mostly a gift, because the same mechanism that scatters legitimate users scatters attackers too.

A botnet is, by definition, distributed. Its nodes sit in thousands of networks across hundreds of countries. The 7.3 Tbps May 2025 flood came from 122,145 source addresses spread across 5,433 autonomous systems in 161 countries. When each of those sources sends a packet to an anycast address, BGP routes that packet to the source’s own nearest POP. A bot in Brazil hits the São Paulo site. A bot in Vietnam hits a site in Asia. No single location sees the 7.3 Tbps total. Each one sees only the bots that happen to route to it, and the global edge footprint determines how finely the total gets sliced. That May attack was handled across 477 data centers in 293 locations. Divide even 7.3 Tbps by a few hundred sites and the per-site load drops to something a normal POP ingests without flinching.

7.3 Tbps total, split by catchment 122k sources, 161 countries ~24 Gbps ~18 Gbps ~31 Gbps no single POP sees the full 7.3 Tbps; spread across 477 data centers in 293 locations *Per-POP figures are illustrative, not measured: catchment is lumpy, so the real split is uneven. The point is that hundreds of ingress points turn one terabit firehose into hundreds of manageable streams.*

There is a darker side to the same property, and the research community has been blunt about it. Catchment is uneven, so some sites pull disproportionate attack load. If the bots near your biggest POP are dense, that POP can be overwhelmed while others sit idle, and anycast gives the operator little direct control over which site catches which attacker. The work on anycast agility against DDoS treats catchment as something you actively tune under attack, shifting announcements to spread load or to deliberately concentrate an attack onto a site you are willing to sacrifice. The naive view that anycast magically balances load is wrong. It disperses load, which is different, and the dispersion is only as good as the footprint and the announcement policy behind it.

For the steady-state mechanics of how that catchment forms in the first place, the anycast routing post covers BGP path selection in detail; here the only thing that matters is that the split happens for free, before any packet is inspected.

The scrubbing center and how traffic gets there

Plenty of organizations do not put their services behind a reverse proxy. They run their own IP space, their own routers, their own origin, and they want it to keep working under attack without re-architecting around a CDN. For them the model is the scrubbing center, and the interesting part is the plumbing that pulls traffic into it.

The diversion is done with BGP. The scrubbing provider announces the customer’s IP prefix from its own autonomous system, usually as a more specific route than the customer normally advertises. Because routers prefer the most specific matching prefix, the more-specific announcement wins globally, and traffic destined for the customer starts arriving at the scrubbing network instead of the origin. That is the on-ramp. Inside the scrubbing center the traffic runs through filtering: signature matching, rate limits, protocol validation, challenge-response for some vectors. What survives is clean. The off-ramp returns that clean traffic to the origin, and because the public path now points at the scrubber, you cannot just forward it normally without looping. The standard answer is a GRE tunnel: clean packets are encapsulated and sent through a tunnel to the customer’s edge, where they are decapsulated and delivered. GRE adds a fixed 24 bytes of header per packet and a small latency penalty proportional to the distance between scrubber and origin. Some providers use private interconnects or layer-2 handoff instead of GRE where the customer is physically close, which avoids the encapsulation tax.

attack + users scrubbing center filter / rate-limit prefix announced by scrubber origin clean traffic, GRE tunnel +24 bytes/packet, ~1-3 ms *The customer's prefix is advertised from the scrubber, so all traffic lands there first. Clean packets ride a GRE tunnel back to the origin, which still believes it owns the address.*

The choice that defines a scrubbing deployment is always-on versus on-demand. Always-on means the more-specific announcement stays up permanently, so every packet runs through the scrubber whether or not there is an attack. There is no activation delay and no detection gap, at the cost of routing all normal traffic through extra hops and paying for that capacity continuously. On-demand keeps traffic on its normal path until an attack is detected, then triggers the BGP diversion. It is cheaper and adds no steady-state latency, but it pays a switchover cost: the diversion only takes effect once the more-specific route propagates, and global BGP convergence typically runs a few minutes. During that window the attack hits the undefended origin. For an attack that lasts 35 seconds, that window is the entire attack, which is one reason hyper-volumetric short bursts have pushed serious targets toward always-on or toward the proxied edge model, where there is nothing to switch on.

A five-year measurement study of BGP-based scrubbing adoption, looking at Akamai Prolexic, Cloudflare, Vercara, Imperva, and Radware, found the practice growing but still niche. Between 2020 and 2024 the share of autonomous systems using these services rose from 0.7 to 2 percent, from 464 to 1,730 ASes, and protected prefixes grew from 3,154 to 12,362. Financial institutions led: by December 2024 just over 7 percent of financial-sector ASes used a scrubber, the highest of any sector. The study inferred who was protected by spotting the scrubber’s ASN in the BGP AS-PATH, which is exactly the signature the diversion mechanism leaves behind.

RTBH and flowspec: dropping it upstream

Scrubbing filters traffic. Sometimes you do not want to filter, you want to drop, and you want to drop it before it reaches you at all, on someone else’s router. Two BGP mechanisms do this, and both predate the modern scrubbing market.

Remotely triggered black hole filtering is the blunt one. It is described in RFC 5635, published August 2009, by Warren Kumari and Danny McPherson. The idea: every edge router is pre-configured with a static route for some discard prefix, typically pulled from the TEST-NET block 192.0.2.0/24, pointing at the null interface. When a destination comes under attack, you announce that destination into BGP tagged with a special community, and routers receiving the tagged route rewrite its next-hop to the discard route. From that moment, all traffic to the victim address is forwarded straight to null and dropped. The drop happens at line rate in hardware, which is the whole appeal, because forwarding to null is something a router does far faster than packet-by-packet filtering.

The cost is obvious and brutal. Destination-based RTBH drops everything to the victim, attack and legitimate traffic alike. You have finished the attacker’s job for them: the target is now unreachable, by your own hand. RTBH is a tool for protecting the rest of the network from collateral damage when one host is being used to congest shared links. It saves the neighborhood by sacrificing the house. RFC 5635 also describes a source-based variant that pairs the blackhole with unicast reverse-path forwarding, so that a uRPF check against a discard-routed source address fails and those packets get dropped. Source-based RTBH can drop by attacker instead of by victim, but it depends on attackers using a small, stable set of source addresses, which volumetric spoofed floods rarely oblige.

Flowspec is the surgical successor. The current specification is RFC 8955, “Dissemination of Flow Specification Rules,” published December 2020, which obsoletes the original RFC 5575. Instead of blackholing a whole destination, flowspec lets you distribute a multi-field match across the network through BGP itself. The standard defines twelve component types you can match on, including source and destination prefix, IP protocol, source and destination ports, ICMP type and code, TCP flags, packet length, DSCP, and fragmentation status. A rule can combine several of these. Match UDP, source port 123, packet length in the amplification range, destined for the victim, and act on exactly that. The actions are richer than null-route-or-nothing: the extended communities defined in the spec include traffic-rate-bytes (community 0x8006) and traffic-rate-packets (0x800c) to rate-limit rather than drop, traffic-action (0x8007) for sampling, and rt-redirect (0x8008) to shunt matching traffic into a separate routing instance, which is one way to feed a scrubber. Setting a traffic rate of zero is the discard action.

RTBH (RFC 5635) flowspec (RFC 8955) match: dst = 192.0.2.7 action: next-hop -> null drops ALL traffic to the host, attack and users alike line-rate, hardware drop match: proto=UDP sport=123 len=468 dst=192.0.2.7 action: rate-limit / drop / redirect drops the NTP reflection, leaves the rest reachable *RTBH trades the victim's reachability to save the link. Flowspec carries a real match expression in BGP, so it can drop a single amplification vector while normal traffic to the same host keeps flowing.*

Flowspec is more powerful and correspondingly more dangerous: a fat-fingered rule propagates across your network in seconds and can blackhole legitimate traffic just as fast, which is why many operators rate-limit how many flowspec rules they will accept and from whom. Both tools share a structural limit. They run on routers, with router-sized rule tables and router-grade match logic. They are excellent at killing a clean, well-characterized vector, an NTP reflection, a single spoofed UDP port, and poor at anything that looks like real traffic. For the messy cases you still need something that can do stateful work on the packet, which is where the edge pipeline comes in. The broader history of these network-layer attacks lives in the Layer 7 DDoS and DNS amplification posts.

The autonomous edge pipeline

Once anycast has split the attack into per-POP streams, something on each machine has to decide which packets to drop. At the scale and speed of these attacks, that decision cannot involve a human, and it cannot involve copying every packet up into userspace. The design that has converged across the industry samples in the kernel, fingerprints in userspace, and enforces back down in the kernel.

Cloudflare has documented its version in enough detail to use as the worked example, though the same shape appears in other large networks. Traffic is sampled using eBPF programs attached at XDP, the eXpress Data Path, which is the earliest point in the Linux network stack where you can run code on a packet, before the kernel allocates its socket buffer. The sampler does not look at every packet; it pulls a statistical sample and records attributes like source and destination IP, ports, protocol, TCP flags, and observed packet rates. Those samples flow to a userspace daemon Cloudflare calls dosd, the denial-of-service daemon, which holds the heuristics that decide when traffic is an attack. As it ingests samples, dosd generates many candidate fingerprints, permutations of the attributes that might characterize the malicious flow, and uses a streaming algorithm to pick the fingerprints that best isolate attack traffic from everything else. The chosen fingerprint is compiled into an eBPF program and pushed back inline at the NIC, where it drops matching packets at XDP before they cost the host any real work.

XDP sampler in kernel, pre-skb dosd fingerprint, userspace eBPF drop at NIC, line rate samples compiled rule fingerprint gossiped within and between data centers *Sample in the kernel, decide in userspace, enforce back in the kernel. The packet that gets dropped never reaches a socket, which is what lets one host shed tens of gigabits without melting.*

Two design choices make this scale. First, the whole loop runs per server: every machine samples its own traffic and writes its own drop rules, so there is no central bottleneck and nothing to overload by attacking the controller. Second, the machines share what they learn. A fingerprint discovered on one server is gossiped, multicast, to the other servers in the same data center and onward to other data centers, so a vector first seen in Frankfurt is already known in Singapore by the time the bots there ramp up. Cloudflare reported the 3.8 Tbps September 2024 campaign, including an event of 2.14 billion packets per second, mitigated this way with no human in the loop, and described the 7.3 Tbps May 2025 attack as handled fully autonomously without triggering an alert. The reason packet rate matters as much as bit rate here is that the per-packet cost of the XDP path is what bounds you; a 5.1 billion packets-per-second flood, like the 11.5 Tbps UDP flood of September 2025, is a stress test of the sampler and the drop program more than of raw link capacity.

This is the lineage of the older signature-and-rate systems that providers used to call by names like gatebot, modernized into something that writes kernel bytecode on the fly. The principle is the same one rate-limiting algorithms rest on: characterize the abusive flow tightly enough that you can act on it without touching the legitimate flow next to it. The difference is that here the action happens in eBPF at the NIC, at a layer where there is no application yet to protect, and the characterization has to survive spoofed sources and millions of packets a second.

Capacity, headroom, and the economics of soaking

All of the cleverness above has a precondition that no algorithm can supply: you have to be able to receive the attack before you can drop it. A 31 Tbps flood arriving at a network with 5 Tbps of ingress does not get filtered, it gets congested, and the links saturate before any eBPF program runs. The first line of volumetric defense is not software. It is ports.

This is where the architecture turns into an economics problem. A large network provisions far more external capacity than its traffic uses, and the gap is deliberate. Cloudflare crossed 500 Tbps of external capacity in April 2026, defining external capacity as the sum of every port facing a transit provider, private peer, internet exchange, or customer interconnect across more than 330 cities. Peak daily utilization is only a fraction of that, and the company is explicit that the rest is the DDoS budget. Headroom is the product. The same surplus that keeps the network fast under a traffic spike is what lets it ingest a 31.4 Tbps attack across hundreds of sites and drop it at the edge with, in their words, no traffic backhauled to a centralized scrubbing center and no human intervening.

That headroom is not free, and the way you pay for it is the quiet reason the volumetric-defense business consolidated into a few large networks. Capacity comes from interconnection. The cheapest bits are settlement-free peering, where two networks exchange traffic at an internet exchange or over a private link without paying each other, and a network that peers with thousands of others, Cloudflare reports interconnecting with more than 13,000, fills most of its ports without buying transit. Each peering session and each exchange port costs something to run, but the marginal cost of one more peer is low and the capacity it adds is real. A network that has already built that fabric for performance reasons gets DDoS absorption as a near-byproduct: the same ports that serve cat videos at peak soak attack packets off-peak. A network that has not built it has to buy transit by the gigabit to stand in front of a terabit attack, and that math does not close. The result is structural. Soaking volumetric DDoS is a capacity game, capacity is an interconnection game, and interconnection rewards scale, so the set of networks that can credibly absorb a multi-terabit attack is small and stays small.

The attacker’s side of the same ledger has moved fast. The botnets behind the 2025 records were not exotic. The Aisuru-Kimwolf botnet behind the 31.4 Tbps attack was built largely from malware-infected Android TVs, an estimated one to four million of them, and the same kind of cheap, numerous, poorly-secured device that the Mirai botnet weaponized in 2016. Cloudflare reported mitigating 34.4 million network-layer attacks in 2025 against 11.4 million in 2024, with attack sizes growing several-fold. When the cost of assembling a multi-terabit cannon keeps falling and the cost of absorbing it keeps rising with the size of the network you must build, the equilibrium is the one we have: defense concentrated in a handful of very large anycast networks, and everyone else either behind one of them or exposed.

What the records actually prove

The headline numbers, 7.3 Tbps, 11.5 Tbps, 31.4 Tbps, read like an arms race the defenders are losing, and on the attacker’s side the trend is genuinely steep. But the records are interesting for the opposite reason. Each one was absorbed without a human waking up. The 35-second duration of the largest attacks is itself a tell: hyper-volumetric floods have gotten short because the long ones do not accomplish anything anymore, so attackers settle for trying to find a gap in the seconds before autonomous mitigation locks on. Against a network with the footprint to disperse and the headroom to ingest, even that gap is mostly closed.

The defense is not one clever idea. It is anycast dispersion turning a firehose into hundreds of streams, plus a per-server pipeline that writes kernel drop rules faster than a person could read the alert, plus enough provisioned capacity that the streams fit through the door in the first place, plus the older BGP tools, RTBH and flowspec and scrubbing diversion, for the networks that cannot or will not hide behind a proxy. None of those works alone. A scrubbing center with no headroom congests. Headroom with no fingerprinting forwards the attack to your origin. Fingerprinting with no dispersion tries to filter a terabit on one box and fails. The architecture is the interaction, and the interaction is expensive, which is the real lesson buried in the press releases. The reason a 31 Tbps attack is a non-event for a few companies and a death sentence for everyone else is not that those companies are smarter. It is that they already paid for the ports.


Sources & further reading

Further reading