Layer 7 DDoS: how application-layer floods differ from volumetric attacks

A volumetric DDoS attack has an honest brutality to it. The numbers are enormous, the goal is simple, and the defense is mostly a question of who owns more capacity. In December 2025, Cloudflare absorbed a 31.4 terabit-per-second flood that lasted thirty-five seconds. You do not subtly outwit 31.4 Tbps. You either have enough pipe and enough scrubbing to drink it, or you go dark. That fight is decided at the network edge, by anycast and packet filters, before a single byte reaches anything that looks like an application.

Now imagine a different attacker. Same goal, knocking a site offline, but instead of 31.4 Tbps they send you two million well-formed HTTP requests per second. Each one is a valid GET, with a plausible User-Agent, a real TLS handshake, cookies, the works. Each one, on its own, is indistinguishable from a person reloading a page. The bandwidth is trivial. The bytes barely register on a traffic graph. And yet behind every one of those requests sits a database query, a template render, a cache lookup that missed. That is a Layer 7 attack, and it is a fundamentally different problem. The question this post answers is why the second attacker is, in many ways, the harder one to stop, despite moving a rounding error’s worth of traffic compared to the first.

The sections below build up from the OSI distinction, walk through what an HTTP flood actually costs the server versus the attacker, separate the high-rate floods from the low-and-slow connection-exhaustion attacks, look at how HTTP/2 changed the economics with Rapid Reset, and then work through the mitigation stack: rate limiting, challenges, and bot management, with an honest account of where each one fails.

Layer 7 versus Layer 3/4: where the attack lands

The OSI model is a teaching abstraction more than an operational reality, but the DDoS world has settled on it as shorthand and the shorthand is useful. Layer 3 is the network layer, IP. Layer 4 is the transport layer, TCP and UDP. Layer 7 is the application layer, where HTTP lives. An attack is named for the layer whose resources it tries to exhaust.

Volumetric and protocol attacks live at L3/L4. A UDP flood, a SYN flood, a DNS amplification, a GRE flood like the one that took Brian Krebs offline at 620 Gbps in September 2016, all of these work by filling something up below the application. They fill the link with bits, or they fill the connection table with half-open TCP handshakes, or they fill a middlebox’s session memory. The defender’s job is to identify the junk and drop it as early and as cheaply as possible, ideally in hardware or in a kernel-bypass path, before any stateful processing happens. The metric that matters is bits per second and packets per second, because the attack is a quantity-of-traffic problem.

A Layer 7 attack does not try to fill the pipe. It tries to make the application do expensive work. The traffic can be small, the packets few, but each request triggers a chain of computation: parse the headers, terminate TLS, route the request, run the handler, hit the database, render the response. The right metric is requests per second, not bits per second, because the cost to the victim scales with how many requests get processed, not how many bytes arrive. This is the distinction that organizes everything else.

*Solid boxes are where each attack class concentrates its load. A volumetric flood saturates the lower layers and never needs to reach the handler; an L7 flood passes through the lower layers cleanly and burns CPU at the top.*

There is a definitional subtlety worth getting right, because the industry is sloppy about it. The IETF’s RFC 4732, the 2006 document on Internet denial-of-service considerations edited by Mark Handley and Eric Rescorla, defines a denial-of-service attack plainly as one where machines “attempt to prevent the victim from doing useful work,” and it separates bandwidth-consumption attacks from resource-exhaustion attacks as distinct problems needing distinct defenses. OWASP draws an even sharper line in its Automated Threats taxonomy. Its OAT-015 Denial of Service entry covers business-logic and resource-exhaustion abuse, but it explicitly pushes the raw floods out of scope: the ontology “excludes other forms of denial of service that affect web applications, namely HTTP Flood DoS (GET, POST, Header with/without TLS), HTTP Slow DoS, IP layer 3 DoS, and TCP layer 4 DoS.” So in OWASP’s strict reading, an HTTP GET flood is not even “application-layer DoS” in the business-logic sense; it is its own category. In common usage, and for this post, Layer 7 DDoS covers both: the brute HTTP floods and the cleverer abuse of expensive endpoints. The point of flagging the distinction is that when a vendor and a researcher disagree about whether something is “L7,” they are often using two different definitions.

The HTTP flood and the asymmetry that makes it work

Strip an HTTP flood down to its core and it is just a lot of HTTP requests sent faster than the server can answer them. AWS, in its DDoS resiliency guidance, describes it cleanly: an attacker “sends HTTP requests that appear to be from a valid user,” and notes that the more sophisticated floods “attempt to emulate human interaction with the application,” which is precisely what defeats naive rate limiting. The requests are valid. That is the whole problem. A SYN flood announces itself with malformed or half-finished handshakes; an HTTP flood arrives as a stream of perfectly legal requests, each one a sentence the server is obligated to read and answer.

Floods split along the HTTP method. A GET flood pulls resources, and if it targets something the CDN can cache, it is relatively easy to absorb at the edge. A POST flood is nastier, because POST bodies are rarely cacheable and usually trigger writes, validation, or backend processing that has to happen at the origin. A flood against a search endpoint, a login form, or a product-filter API that fans out into database joins is worse still, because each cheap-to-send request maps to expensive-to-serve work.

This is the asymmetry, and it is the entire reason L7 attacks are attractive. The attacker spends almost nothing per request. The defender spends a great deal. Consider the chain a single dynamic request sets off versus the chain the attacker runs.

*The attacker pays once, cheaply, to write and send. The server pays repeatedly and expensively to parse, authenticate, query, and render. Multiply the right-hand column by the request rate and the origin's resources run out long before its bandwidth does.*

The numbers from real incidents make the asymmetry concrete. In August 2021, Cloudflare reported a flood that peaked at 17.2 million requests per second, which at the time was the largest L7 attack on record. That traffic came from “more than 20,000 bots in 125 countries,” and Cloudflare noted it reached 68 percent of the average rate of legitimate HTTP traffic the company was serving across its whole network at the time. Read that again: one attack against one customer briefly approached two-thirds of the legitimate request volume of the entire planet’s worth of sites behind Cloudflare. The records since have only climbed. The HTTP-flood peak Cloudflare disclosed during the December 2025 Aisuru campaign was around 205 million requests per second.

A small botnet can generate that because the per-request cost to the attacker is so low. The 17.2M rps attack came from 20,000 machines, which works out to under a thousand requests per second per bot, an easy load for any infected device with a network connection. The defender, meanwhile, must answer all of it, and answering is where the cost lives.

Why bandwidth scrubbing does not catch it

The defense that stops a volumetric attack is, broadly, capacity plus filtering. You spread the target across an anycast network so the flood disperses across many points of presence, and at each one you drop the malicious packets in a fast path. Crucially, the dropping decision can be made on cheap signals: the packets are malformed, or they are UDP to a port that should never see UDP, or they match a reflection signature, or they simply arrive from a source that has no business sending them. None of that requires understanding the application.

An L7 flood defeats this on two fronts. First, it does not move enough volume to trip the capacity-based defenses at all. A two-million-rps HTTP flood might be a single-digit number of gigabits per second on the wire, a quantity any serious edge network handles without noticing. The bits-per-second graph stays flat. Second, and worse, the requests are valid, so there is no cheap packet-level signal to filter on. To decide that a given request is part of the attack, you have to do real work: parse it, look at its headers, maybe consult its history, maybe correlate it with thousands of others. The decision itself costs CPU, which means the act of filtering competes for the very resource the attacker is trying to exhaust.

This is the central difficulty and it is worth sitting with. At L3/L4 the filter is cheaper than the attack, so the defender wins on economics at the edge. At L7 the filter can be as expensive as serving the request, so a poorly designed defense just moves the bottleneck around. The good mitigations are the ones that make the keep-or-drop decision cheaply, early, and statefully, before the request reaches the expensive backend. Everything in the mitigation section below is, at bottom, an attempt to push the decision earlier and make it cheaper.

Low and slow: the other shape of L7

High-rate floods are the loud version. There is a quiet version that exhausts the same resources from the opposite direction, by holding connections open rather than by sending many of them. These are the low-and-slow attacks, and Slowloris is the canonical one.

Slowloris, written by Robert Hansen in 2009, opens a pile of connections to a web server and then sends each HTTP request one header at a time, agonizingly slowly, never finishing. It sends just enough, just often enough, to keep each connection from timing out. A thread-per-connection server has a fixed pool of worker slots; Slowloris fills every slot with a request that never completes, and once the pool is full the server cannot accept anyone legitimate. The bandwidth is almost nothing. The published descriptions are consistent that a single machine can tie up many thousands of connections this way. R.U.D.Y., short for “R-U-Dead-Yet,” does the same trick in the other direction: it announces a large request body with a Content-Length header and then dribbles that body out a byte at a time, holding a POST handler open indefinitely.

*Slowloris and R.U.D.Y. exhaust the connection pool rather than the bandwidth. The request never completes, so the worker slot never frees, and the cost on the wire is a few bytes every several seconds.*

Architecture decides whether this works. The attacks bite hardest against thread-per-connection servers, classically Apache’s prefork model and IIS, where each open connection ties up a real worker. Event-driven servers that multiplex many connections over a small number of threads, Nginx most prominently, are far more resistant because an idle, half-sent request just sits in an event loop costing almost no memory and no worker slot. The standard mitigations follow from this: cap how long a client may take to send a complete request, cap the number of concurrent connections per source IP, and put a buffering reverse proxy in front of a thread-based origin so the proxy, not the origin, eats the slow drip. Modern fronting proxies and CDNs do this by default, which is why low-and-slow attacks are less feared in 2026 than they were in 2009. They have not disappeared. They have been pushed to the edges where most deployments already have a buffering layer, and they remain a real threat to anything exposing a thread-based server directly.

HTTP/2 and the Rapid Reset rewrite of the economics

For a decade the ceiling on an L7 flood was, loosely, a function of round-trip time. Under HTTP/1.1 a client sends a request, waits for the response, sends the next. Pipelining helped a little and was mostly disabled in practice. To push more requests you opened more connections, and connections cost the attacker something too. HTTP/2 changed the shape of the problem, and in 2023 an attack technique turned that change into a weapon.

HTTP/2 multiplexes many concurrent streams over one TCP connection. A client can have dozens of requests in flight at once on a single connection, and the server advertises how many via the SETTINGS_MAX_CONCURRENT_STREAMS parameter, commonly 100. The protocol also lets either side cancel a stream at any time by sending an RST_STREAM frame. Those two features, multiplexing and cheap cancellation, combined into CVE-2023-44487, the HTTP/2 Rapid Reset attack, first observed in the wild starting August 25, 2023.

The mechanism is elegant and nasty. The attacker opens a stream, sends the request, and immediately sends RST_STREAM to cancel it, then opens another in its place and does it again, as fast as the network allows. On the server side, a reverse proxy may have already forwarded that request to a backend before it processes the cancellation, so the work gets done even though the client threw the answer away. Because the client cancels each stream the instant it opens, it never bumps against the concurrent-stream limit, so that limit provides no protection. Cloudflare’s write-up put the point bluntly: stream concurrency on its own cannot mitigate rapid reset, because the client can churn requests at a high rate no matter what value the server picks. The number of in-flight requests stops depending on round-trip time and starts depending only on the attacker’s available bandwidth.

The result was a step-change in attack size from comparatively small botnets. Google reported a peak above 398 million requests per second, which it described as more than seven times its previous record of 46 million. Cloudflare measured its own peak from the campaign at just over 201 million rps. Amazon reported mitigating attacks in the same window. The botnet behind the figures was on the order of 20,000 machines, a modest fleet for that output, which is the whole story: Rapid Reset let a small botnet punch far above its weight by removing the round-trip ceiling.

*The stream is opened and cancelled in the same breath. The reset keeps the attacker under the concurrency cap while the server still does the work, decoupling request rate from round-trip time.*

Rapid Reset has a fuller treatment of its own in The HTTP/2 Rapid Reset attack (CVE-2023-44487) explained. For the purposes of this post it matters as the clearest modern example of a protocol-level L7 attack: not a flood of complete requests, but an abuse of the request lifecycle itself, which is why patched servers had to start accounting for and rate-limiting resets rather than just counting requests.

Mitigation: rate limiting

Rate limiting is the first and bluntest tool. Count requests by some key and reject anything above a threshold. The art is entirely in the choice of key, because the key decides who shares a budget.

The naive key is the source IP. Cap each IP at, say, 100 requests per minute and a single noisy client gets throttled. This works against unsophisticated single-source floods and almost nothing else. Distributed attacks spread across tens of thousands of IPs, each staying politely under any per-IP limit, and the per-IP counters never fire. Worse, IP-based limits punish the legitimate case where many real users share one address. A corporate NAT, a mobile carrier gateway, or a university egress can put thousands of genuine people behind a single IP, and a per-IP cap tuned to stop a bot will lock them all out. And because attacks ride behind proxies, the IP the edge sees may not be the client’s real one at all, which makes correctly resolving the originating address its own discipline, covered in The X-Forwarded-For chain.

Better keys narrow the budget to something the attacker cannot cheaply rotate. Rate-limit by session cookie, by authenticated user, by API key, by the combination of IP and a behavioral fingerprint, or by the specific expensive endpoint rather than the site as a whole. A token-bucket limiter on the login route or the search API, sized to real human usage, will throttle a flood against that route while leaving the cacheable marketing pages untouched. The general theory of these limiters, token buckets and adaptive backoff and the 429 dance, is the same whether you are defending an endpoint or being a polite client against one, and the client-side view is in Rate limiting yourself.

The honest limit of rate limiting is that a well-built L7 attack is designed to look like legitimate traffic, and AWS makes exactly this point about floods that “emulate human interaction.” If each bot’s behavior is statistically indistinguishable from a person and the budget is set where real people live, the bot stays under it. Push the threshold low enough to catch the bots and you start refusing real users. Rate limiting buys time and stops the lazy attacks. It does not, by itself, solve the hard ones, which is why it is the floor of the mitigation stack rather than the whole of it.

Mitigation: challenges

When you cannot tell attacker from human by counting, you ask the client to prove something a bot finds expensive and a browser finds cheap. That is the challenge family, and it spans a wide range of friction.

The lightest challenges are invisible. A JavaScript challenge serves a small computation the client must run and return before the real response is released, which a real browser does silently in a fraction of a second and a bare HTTP flooding tool, with no JavaScript engine, simply cannot complete. A proof-of-work challenge asks the client to burn a measurable slice of CPU on a hash puzzle before proceeding, which inverts the attack’s economics: now the attacker pays per request too, and a flood that was nearly free becomes expensive to sustain. The cryptographic-cookie variant issues a signed token once the client clears the bar and checks the token on later requests, so the expensive part runs once per session rather than once per request. Heavier challenges escalate to an interactive CAPTCHA when the lighter signals are ambiguous.

Cloudflare’s documented progression runs from a managed challenge that picks the lightest sufficient test, up through a non-interactive JavaScript challenge, to an interactive challenge, with the platform choosing based on how suspicious the request looks. The deeper mechanics of who issues what and when are in Cloudflare’s managed vs JS vs interactive challenge, and the proof-of-work angle specifically in The proof-of-work renaissance.

The cost of challenges is friction, and friction is never free. Every interstitial adds latency and a chance that a real user bounces. CAPTCHAs annoy people and exclude some entirely. A JavaScript challenge breaks legitimate non-browser clients: API consumers, RSS readers, accessibility tools, monitoring probes. So challenges are deployed surgically, triggered by suspicion rather than blanket-applied, which means the system still needs a way to decide who is suspicious. That decision is the bot-management layer.

Mitigation: bot management and the scoring approach

The most capable L7 defenses do not make a single keep-or-drop decision. They score every request on many signals and route it, cheaply for the obviously-good, with a challenge for the ambiguous, with a block for the obviously-bad. The goal is to push the verdict as early as possible and reserve expensive scrutiny for the requests that warrant it.

The signals come from every layer the request touches. At the TLS layer the ordering of the ClientHello’s cipher suites and extensions yields a fingerprint, the basis of TLS fingerprinting from JA3 to JA4, and a Python script’s handshake looks nothing like Chrome’s. At the HTTP/2 layer the SETTINGS frame values, the window sizes, and the pseudo-header ordering form another fingerprint that betrays a non-browser client even when the User-Agent claims otherwise. Header order and casing add more. On top of that sits IP and ASN reputation, behavioral history, and, where JavaScript runs, a browser-environment fingerprint. The largest vendors also pool intelligence across their whole customer base, so an IP or fingerprint seen attacking one site is already suspect at the next. The combined output is a score, and the score picks the action. Cloudflare exposes this as a 1-to-99 bot score, detailed in Cloudflare Bot Management scoring; the broader architecture of collecting signals before deciding is the subject of Server-side vs client-side bot detection.

*Most of the signals come from bytes the client already had to send, so the score can be computed at the edge before the request reaches an application handler. That is what makes the filtering decision cheaper than serving the request.*

The point that ties this back to the asymmetry problem is that fingerprinting reads signals the client already had to emit. The ClientHello, the HTTP/2 SETTINGS, the header order, all of it arrives in the first packets, before the application does any work. So the score can be computed at the edge, cheaply, and the obviously-malicious request dropped before it ever reaches the database. That is how a defense escapes the trap where filtering costs as much as serving: it decides on data it was going to receive anyway, at a layer below the expensive one.

None of this is a clean win, and the people who run these systems know it. Fingerprints can be forged; tools that mimic a browser’s exact ClientHello and HTTP/2 settings exist precisely to defeat the cheap signals, which is why the vendors keep adding more of them and pooling intelligence to raise the cost of a convincing imitation. The arms race is permanent. But the structural insight holds regardless of who is ahead this quarter: an L7 defense wins by making its decision early and cheaply, on data the attacker cannot avoid sending, and loses whenever it is forced to do expensive work to tell the difference.

Closing: the shape of the problem

The two attackers from the opening never really converge. The 31.4 Tbps flood and the two-million-rps HTTP flood are different problems with different metrics, different defenses, and different places where the fight is won or lost. Volumetric attacks are a capacity contest decided at the network edge on cheap packet signals. Application-layer attacks are a discrimination problem decided at the application edge, where the defender’s whole job is to separate the request that deserves a database query from the request that only wants to trigger one, and to do that separation before the query runs.

What makes L7 the more durable threat is that the attacker’s traffic is, by construction, valid. There is no malformed packet to drop, no reflection signature to match, no obvious tell at the packet level. Every defense reduces to a judgment call about intent, and judgment is expensive and imperfect. Rate limiting handles the careless attacker and punishes shared NATs. Challenges handle the scriptable attacker and annoy real users. Fingerprinting and scoring handle the sophisticated attacker until the attacker buys a better imitation, and then the cycle turns again. The defender’s only structural advantage is position: the malicious request, however well-disguised, still has to send its TLS handshake and its HTTP/2 settings and its headers in a particular order, in the first bytes, before any work is done. Read those bytes well enough and you can throw the request away for almost nothing. Read them poorly and you have built a filter that costs as much as the attack it was meant to stop.

That is why the December 2025 records matter less for their size than for their composition. The 31.4 Tbps headline number is a capacity story, and capacity is mostly solved by whoever owns the bigger network. The 205-million-requests-per-second HTTP figure from the same campaign is the one that should keep an engineer up at night, because every one of those requests looked, byte for byte, like something a person might have sent.

Sources & further reading

IAB / Handley, Rescorla, eds. (2006), RFC 4732: Internet Denial-of-Service Considerations — the foundational IETF document separating bandwidth-consumption from resource-exhaustion DoS.
OWASP (2018), OAT-015 Denial of Service — the Automated Threats entry that scopes application-layer DoS and explicitly separates it from raw HTTP/IP/TCP floods.
Cloudflare (2021), Cloudflare thwarts 17.2M rps DDoS attack — the largest ever reported — the 17.2M rps L7 flood from 20,000 bots, with the 68-percent-of-legitimate-traffic comparison.
Cloudflare (2023), HTTP/2 Rapid Reset: deconstructing the record-breaking attack — the 201M rps figure, the RST_STREAM mechanism, and why MAX_CONCURRENT_STREAMS does not help.
Google Cloud (2023), How it works: the novel HTTP/2 Rapid Reset DDoS attack — Google’s 398M rps peak and the cost-asymmetry explanation of request cancellation.
NIST NVD (2023), CVE-2023-44487 — the official record for the HTTP/2 Rapid Reset vulnerability.
Cloudflare (2025), 2025 Q4 DDoS threat report — the 31.4 Tbps record, the ~205 Mrps HTTP peak, the Aisuru campaign, and the L3/4-versus-L7 share.
AWS (whitepaper), Application layer attacks — Best Practices for DDoS Resiliency — HTTP floods, cache-busting, WordPress XML-RPC, DNS query floods, and TLS-renegotiation attacks described in one place.
Cloudflare (docs), HTTP DDoS Attack Protection managed ruleset — the categories an edge ruleset matches and the origin-error sensitivity model.
Cloudflare (learning), Slowloris DDoS attack — the canonical low-and-slow connection-exhaustion attack and why thread-based servers are most exposed.
KrebsOnSecurity (2016), The Democratization of Censorship — first-hand account of the 620 Gbps Mirai attack that forced Krebs off Akamai’s network.