Skip to content

The history of the web application firewall, from packet filters to ML

· 22 min read
Copyright: MIT
Timeline wordmark reading WAF with an orange underline, tracing packet filters to ML scoring

A packet filter in 1990 could tell you that a connection had arrived on TCP port 80 from some address, and that was roughly the end of its knowledge. It did not read the request line. It did not know whether the bytes after GET spelled out a path to a static page or a string of SQL aimed at the login form. The firewall did its job perfectly and the attack walked straight through, because the attack lived in a layer the firewall was never built to look at.

That gap is the whole reason the web application firewall exists. The network firewall watches addresses and ports; the web application firewall reads the HTTP request itself, the method, the headers, the cookies, the body, and decides whether the bytes describe an attack. The question this post follows is how the industry got from a device that cannot see HTTP to a machine-learning model that scores every request 1 to 99 for how much it smells like SQL injection, and what was gained and lost on the way.

The path runs through four eras. First, the network firewalls of the late 1980s and 1990s and the reason they could not stop web attacks. Then the first dedicated application-layer products, Sanctum’s AppShield in 1999 and the appliance vendors that followed. Then the open-source turn, Ivan Ristic’s ModSecurity in 2002 and the OWASP Core Rule Set, with PCI DSS pushing WAFs from optional to mandatory. Finally the move to the cloud and to statistical scoring, where the rule list stops being the only thing making the decision.

What the network firewall could not see

The firewall started as a packet filter. Engineers at Digital Equipment Corporation built the first ones around 1987, and Bill Cheswick and Steve Bellovin carried the work forward at AT&T Bell Labs. A packet filter sits at the network and transport layers. It reads source and destination IP addresses, the protocol number, and the TCP or UDP port, and it matches those fields against a list of allow and deny rules. Cheap, fast, stateless. It was enough to keep most of the early internet’s threats outside the perimeter.

The next two generations added memory and depth. Between 1989 and 1990 a group at Bell Labs, Dave Presotto, Janardan Sharma, and Kshitij Nigam among them, worked on circuit-level gateways that tracked the state of a connection rather than judging each packet alone. Check Point turned stateful inspection into a commercial product with FireWall-1 in the mid-1990s, and stateful filtering became the default everywhere. In October 1993 Marcus Ranum, Wei Xu, and Peter Churchyard released the Firewall Toolkit, which became the Gauntlet firewall at Trusted Information Systems. Gauntlet was an application-layer proxy. It understood specific protocols, FTP, DNS, HTTP, and could apply rules to the conversation inside them.

So the application layer was not a foreign concept. The problem was narrower and more awkward. A proxy firewall that “understands HTTP” in 1995 understood the protocol’s shape: methods, status codes, header syntax, maybe content lengths. It did not understand the application sitting behind the HTTP. It had no idea that /login expected a username that should never contain a single quote, or that the q parameter on the search page would be reflected straight back into the HTML without escaping. The dangerous content was valid HTTP. It parsed cleanly. It just happened to be an attack against the program reading it.

what each layer can read about one request L3/L4 src=203.0.113.9 dst=10.0.0.5 proto=TCP port=80 packet filter sees this L7 syntax GET /search?q=... Host: shop.example HTTP/1.1 proxy firewall sees this L7 meaning q = ' OR 1=1 -- (aimed at the SQL query) valid HTTP, valid syntax, hostile intent only a WAF reads this the attack is well-formed at every layer below the application's own logic *The attack is well-formed at every layer below the application's own logic. A packet filter and even a protocol-aware proxy pass it because nothing about it is malformed.*

Two things made this gap urgent at the turn of the millennium. The web stopped being a set of static documents and became a set of programs, CGI scripts, PHP, ASP, early Java app servers, all taking user input and feeding it into databases and shells. And the attack classes that target those programs got names and recipes. SQL injection and cross-site scripting were both being written up and traded in the late 1990s. A network firewall set to allow port 80 to your web server allowed every one of those attacks by definition, because blocking them would mean blocking the web. The defense had to move up to where the application lived. If you want the broader arc of how HTTP itself grew into this attack surface, the history of HTTP covers the protocol side; this post stays on the defense.

AppShield and the first dedicated WAFs

The first product widely called a web application firewall came from a startup. Perfecto Technologies, founded in 1997 by Eran Reshef and Gili Raanan, shipped AppShield in the summer of 1999. The company renamed itself Sanctum in 2000. AppShield’s idea was clever and, in hindsight, a little ahead of what the hardware of the day could comfortably do. It did not work from a list of known bad strings. It read the HTML pages the application sent out, learned what links, forms, and fields those pages actually offered, and built a policy from that. A request that asked for something the application had never advertised, a parameter that was not on any form, a value longer than any field allowed, got blocked.

This is the positive security model, sometimes called allow-listing or a whitelist approach. Instead of enumerating every attack, you enumerate the legitimate surface and reject everything outside it. Done well it catches attacks nobody has named yet, because novelty is exactly what it rejects. Done in practice it fought a constant war with applications that generated URLs dynamically, set cookies the policy had not seen, and changed their forms on every deploy. The policy drifted out of date and the false positives piled up. AppShield found real customers anyway. By 2002 Sanctum reported that sixty of the Fortune 100 were running it, three years after launch. Sanctum was acquired by Watchfire in 2004, and Watchfire by IBM in 2007; the AppShield intellectual property eventually landed at F5, which discontinued it.

The early 2000s filled in with appliance vendors selling boxes you racked in front of your web servers. WebCohort was founded in 2002 by Shlomo Kramer, Amichai Shulman, and Mickey Boodaei, shipped SecureSphere the next year as a combined web-application and database firewall, and renamed itself Imperva in 2004. Imperva went public on the NYSE in 2011 and was acquired by Thales in 2023 for about 3.6 billion dollars. The category got the kind of validation that matters to enterprise buyers when Gartner started ranking WAF vendors in its Magic Quadrant; Imperva has been placed as a Leader every year since 2014. NetContinuum, Teros, Citrix with its acquisitions, F5 with its ASM module, and others rounded out a market that, for most of the decade, meant a physical box from a named vendor.

Two design choices defined that box. The first was the security model just described, positive versus negative, allow-list versus block-list, and most products ended up doing some of each. Pure positive models were too brittle to run alone, so vendors layered a negative model on top: a list of signatures for known attacks, the regex-and-blacklist approach that would come to dominate. The second choice was deployment. A WAF can sit inline as a reverse proxy, terminating the connection and forwarding clean traffic on, which gives it full control but puts it squarely in the path of every request and every outage. Or it can sit out of band, watching a mirror of the traffic and able to alert but not block. The reverse-proxy posture won, because a WAF that cannot block is a very expensive logging system, and it set the template for everything that came after. The reverse-proxy idea has a long lineage of its own, traced in the history of the proxy.

ModSecurity and the open-source turn

In November 2002 Ivan Ristic released the first version of ModSecurity, an Apache module for the 1.3.x server. His motivation was mundane and exactly right: a web server gives an administrator almost no visibility into request bodies and responses, so he wrote a module that exposed that traffic and a rule language to inspect it and act on what it found. It was a hobby project. By 2004 it was a company, Thinking Stone, and Ristic was working on it full time.

ModSecurity mattered for a reason that had nothing to do with being technically first; AppShield beat it by three years. It mattered because it was open source and it ran inside the web server everyone already had. You did not buy an appliance. You loaded a module into Apache and you wrote rules in a text file. That put a real, programmable WAF in front of a generation of engineers who would never have gotten budget for a Sanctum or Imperva box, and it made the rules themselves readable, shareable, and arguable in public. A ModSecurity rule is a SecRule that names a target (a part of the request), an operator (often a regular expression), and an action. The whole thing is legible. You can see exactly what is being matched and why it blocks.

the shape of one rule SecRule ARGS:q "@detectSQLi" block,log target which part of the request operator the test applied to the target action what to do on a match readable, shareable, and arguable in public — the thing appliances were not *A SecRule names where to look, what test to run, and what to do on a match. The legibility is the point: the same syntax that runs in production is the syntax you read to audit it.*

The corporate history is a chain of acquisitions that is worth keeping straight because the licensing and stewardship changed each time. Breach Security acquired Thinking Stone in September 2006. The same year, version 2.0 landed at the OWASP AppSec conference in Seattle, and version 2.5 followed in February 2008 with a reworked syntax. Trustwave bought Breach Security in June 2010 and relicensed ModSecurity under the Apache license, which kept it genuinely open. Microsoft contributed an IIS port in 2012; an Nginx port arrived around the same time, presented at Black Hat. A ground-up rewrite started in December 2015 and produced libmodsecurity, the standalone 3.0 engine announced in January 2018, which decoupled the rule engine from any single web server.

Then the long goodbye. Trustwave announced end-of-sale in 2021 and end-of-life support on 1 July 2024, and the project moved under OWASP’s stewardship, where 3.0.x continues to receive maintenance releases. The C engine that started as a 2002 hobby project for Apache 1.3 is still shipping patches more than two decades on, which is a rare thing in this industry. The torch for new engine work has partly passed to Coraza, a Go reimplementation that speaks the same rule language and runs the same rule set, which matters for the next part of the story.

The Core Rule Set and the scoring idea

A WAF engine without rules is an empty filter. ModSecurity’s rules came, increasingly, from the OWASP Core Rule Set, a community-maintained pack of generic attack-detection rules that any compatible engine can load. CRS covers SQL injection, cross-site scripting, local and remote file inclusion, code injection across several languages, and the rest of the usual catalogue. It was under Trustwave’s copyright from 2006 to 2020 and is now governed by the CRS project itself, licensed Apache 2.0. The same rules run today under ModSecurity, under Coraza, and inside commercial WAFs that adopted the set.

The interesting part of CRS is not the list of patterns. It is the decision model the project settled on, because it is the answer to a problem that nearly sank rule-based WAFs: false positives. A naive WAF treats every rule as a self-contained verdict. One rule matches, the request dies. The trouble is that any single regex broad enough to catch real attacks will also catch some legitimate request, and the moment your WAF starts blocking a customer’s perfectly good order it gets switched off, which protects no one.

CRS handles this with anomaly scoring. The detection logic is decoupled from the blocking decision. A rule that matches does not block; it adds points to a running anomaly score for the request. The points come from the rule’s severity: a critical-severity match adds 5, an error adds 4, a warning adds 3, a notice adds 2. Only after every inbound rule has run does the engine compare the accumulated score against a threshold, 5 by default, and block if the request is over. There is a separate, lower outbound threshold, 4 by default, for inspecting responses on the way back out so the WAF can catch, for example, a database error or a reflected payload leaking in the HTML. The blocking rule that makes the final inbound call carries the ID 949110, and seeing 949110 Inbound Anomaly Score Exceeded in a log is the everyday signature of CRS deciding a request crossed the line.

collaborative scoring, not one-shot blocking rule 942100 SQLi pattern +5 critical rule 920270 invalid char in arg +3 warning rule 920170 GET with body +2 notice running score = 10 threshold 5 10 >= 5 rule 949110 fires: inbound anomaly score exceeded, request denied *No single rule blocks. Severity points accumulate, and rule 949110 makes the call once the running score crosses the threshold. Raising that threshold to cut false positives also lets attacks through, which is the whole tuning problem in one number.*

Layered on top is the paranoia level, a dial for how aggressive the rule set should be. Paranoia level 1 holds rules that almost never raise a false alarm. Each step up to PL 2, 3, and 4 enables additional, more specialized rules that catch more attacks at the cost of more false positives, until at PL 4 the rules flag almost anything that looks remotely suspicious, including a lot of legitimate traffic. Selecting a level enables every rule up to and including it. The combination of severity scoring and paranoia levels is the practical answer to the brittleness that dogged the early positive-model products: instead of one binary policy, an operator gets two continuous knobs and can trade detection against noise to fit the site. The deep mechanics of tuning all this live in the Core Rule Set post; for the engine itself, see how a WAF actually works.

CRS 3.0 was the major rewrite that introduced paranoia levels in their modern form; CRS 4.0 followed years later with a plugin architecture, better UTF-8 handling, web-shell detection, and over 500 rule bypasses closed off after a bug-bounty project. As of 2026 the project is on the 4.27.x line, with 4.25.0 designated the first long-term-support release for CRS 4 and the 3.3.x line scheduled to go out of support in Q3 2026.

PCI DSS turns optional into mandatory

Technology does not get bought because it is good. It gets bought because someone has to. For WAFs that someone was the Payment Card Industry Data Security Standard, and the lever was requirement 6.6. Public-facing web applications that touched cardholder data had to address vulnerabilities one of two ways: review all application code for security problems, or install an “application-layer firewall” in front of the application. The requirement was a best practice until 30 June 2008, after which it became mandatory for the systems in scope.

Faced with a choice between auditing every line of a sprawling code base and racking a box, most organizations racked the box. Code review was slow, expensive, and never finished, because the code kept changing. A WAF was a purchase order. So PCI 6.6 drove a wave of WAF sales, and it is fair to say the modern WAF market was built as much on that compliance mandate as on any security insight.

Ivan Ristic, the person with the most to gain from WAF sales, wrote in February 2008 that this worried him. His argument, on his own blog, was that organizations would choose WAFs because they were the less ugly option rather than on the merits, and that the requirement read like an afterthought. The risk he named was specific and it came true often enough: products bought and installed but never actually operated, sitting in detection-only mode or with every rule disabled to stop the false positives, present for the audit and absent from the defense. He wanted the standard to mandate continuous operation, not just possession. That a WAF can satisfy a checkbox while protecting nothing is a tension the category never fully escaped, and it is the reason “do you have a WAF” and “is your WAF doing anything” remain two different questions.

The move to the cloud

The appliance model assumed you had a data center with your web servers in it and a rack to put the WAF in front of them. By the early 2010s that assumption was breaking. Applications moved to cloud infrastructure, traffic arrived through content delivery networks, and the natural place to inspect a request was no longer a box you owned but the edge of a network you rented. The WAF followed the traffic to the edge.

Cloudflare built WAF capability into its reverse-proxy edge, where it already terminated TLS and served cached content for millions of sites, so adding request inspection to the path was incremental rather than a new appliance. Amazon launched AWS WAF at re:Invent in October 2015, attaching it to the AWS front doors, CloudFront distributions, Application Load Balancers, and later API Gateway, so protection bound to cloud resources rather than to a physical location. The cloud WAF changed the economics in two directions at once. A small site got enterprise-grade rules for a few dollars a month, because the provider amortized the rule development and the infrastructure across its whole customer base. And the provider got something no appliance vendor ever had: a view of attack traffic across the entire network, every customer’s incoming requests feeding one shared picture of what attacks currently looked like. That shared view is the raw material the next era runs on. The deeper story of how this edge formed is in the history of CDNs, and the company-specific arc is in the history of Cloudflare.

Running the WAF as a managed service at the edge also moved the operational burden. Rule updates that an enterprise once tested and rolled out itself now shipped from the provider, often automatically, the way antivirus signatures do. That is convenient and it is also a loss of control; a managed-rules change at the provider can start blocking your traffic without you touching anything. The trade is the one the whole industry keeps making, less work for less control, and at cloud scale most operators take it.

When the rule list stops being enough

Signatures share one structural weakness. A signature matches a pattern, and an attacker who knows the pattern can write a payload that does the same thing while not matching it. SQL injection has an effectively unbounded number of spellings: comment styles, whitespace tricks, case games, encoding layers, equivalent functions. A regex tuned to catch ' OR 1=1 does not catch the thousandth variation a tool generates, and maintaining regexes to chase every variation is a losing race. The defensive write-ups on WAF evasion concepts walk through why this is structural rather than a matter of better regexes. The rule set is necessary and it is not sufficient.

The cloud providers’ answer was to add a statistical layer that scores requests instead of matching them. Cloudflare shipped machine-learning WAF scoring, described in a March 2022 engineering post, that assigns each request a score from 1 to 99, where 1 is almost certainly an attack and 99 is almost certainly clean. The first models targeted SQL injection and cross-site scripting, exposed as separate scores alongside a combined one; the targeting later widened toward remote-code-execution classes, shell and PHP injection, the Apache Struts and Log4j families. The model trains on traffic that the existing managed rules have already labeled good or bad across the whole network, then learns the shape of an attack well enough to flag variations the rules never enumerated, scoring each part of the request, body, URI, headers, independently to locate where the malicious payload sits.

signature vs score on the same two requests q = ' OR 1=1 -- q = '/**/oR/**/1=1%2D%2D signature ML score match block 3 / 99 no match pass 6 / 99 The obfuscated variant means the same thing. The regex sees a different string and lets it through. The model scores it low (malicious) because it learned the shape, not the spelling. illustrative scores; on Cloudflare's scale 1 is almost certainly an attack, 99 almost certainly clean *The obfuscated variant carries the same meaning. A signature keyed to the literal string passes it; a model that learned the shape of injection scores it low, meaning hostile. The scores shown are illustrative, but the asymmetry is the real point.*

This is the same statistical turn that happened in bot detection, where a request gets a bot score from 1 to 99 rather than a hard allow-or-deny, and the two systems increasingly share infrastructure at the edge. The score does not replace the rules. It runs alongside them, catching the variations they miss and giving the operator another threshold to tune. And it inherits the rules’ core dilemma in a new form: where you set the cutoff is exactly where you trade missed attacks against blocked customers, the same knob the CRS anomaly threshold has been since the start, now turned by a model instead of a count.

Worth being honest about what is and is not public here. The CRS scoring math is fully open; the severity weights and thresholds are in the documentation and the rules are on GitHub. The cloud ML models are not. Cloudflare’s posts describe the 1-to-99 scale, the attack classes, and the training-on-labeled-traffic approach, but the feature set, the model architecture, and the exact inference path are not published, and the latency is described only as negligible without numbers. Anyone telling you precisely how the cloud WAF scores a request is inferring from behavior and documentation, not reading a spec.

What the arc actually shows

The straight line from a packet filter to an ML model hides the fact that the WAF has been solving the same problem the entire time, and the problem has never been clean to solve. A defense that reads application content has to decide what legitimate content looks like, and legitimate content is whatever a thousand developers decided to ship this week. Every era picked a different point on the curve between catching attacks and blocking customers. AppShield’s positive model leaned hard toward catching everything novel and paid in brittleness. The signature WAFs leaned toward not blocking real traffic and paid by missing every payload they had not seen. CRS made the trade explicit and adjustable with severity scores and paranoia levels. The cloud models made the same trade with a learned function and a cutoff. The dial got fancier; it never went away.

What did change is who holds the dial and what they can see. A 1999 appliance saw one site’s traffic and one team’s tuning. A 2026 edge WAF sees a meaningful fraction of all web traffic and tunes against attacks happening to everyone at once, which is a genuine advantage that no on-premises box could ever match. The cost is that the rules and the model both live at the provider, the decision logic has moved out of the operator’s hands, and “we have a WAF” tells you even less about whether anything is being defended than it did when Ivan Ristic raised that exact worry about PCI in 2008. The newest piece of the stack is a model nobody outside the vendor can read, scoring a request in the time it takes to forward it, with a threshold somebody set and most operators never touch.


Sources & further reading

  • Wikipedia (2024), ModSecurity — creation in November 2002 by Ivan Ristic, the Thinking Stone/Breach/Trustwave acquisition chain, libmodsecurity 3.0, and the move to OWASP.
  • Wikipedia (2024), Sanctum Inc. / AppShield — Perfecto Technologies’ founding in 1997, AppShield’s summer 1999 release, its policy-from-HTML positive model, and the Watchfire and IBM acquisitions.
  • Wikipedia (2024), Firewall (computing) — the firewall generations, DEC packet filters around 1987, Bell Labs stateful work, and the Ranum/TIS application-layer Firewall Toolkit in October 1993.
  • Wikipedia (2024), Imperva — WebCohort’s 2002 founding, SecureSphere, the 2011 IPO, and the 2023 Thales acquisition.
  • OWASP CRS Project (2024), Anomaly Scoring — severity-to-score mapping (critical 5, error 4, warning 3, notice 2), decoupled detection and blocking, and the default thresholds.
  • OWASP CRS Project (2024), Paranoia Levels — what PL 1 through 4 enable and the detection-versus-false-positive trade at each level.
  • OWASP CRS Project (2026), coreruleset.org — current 4.27.x line, the 4.25.0 LTS release, CRS 4 features, and the Q3 2026 end of 3.3.x support.
  • Ivan Ristic (2008), Is PCI 6.6 good for web application firewalls? — the ModSecurity author’s own worry that PCI would drive WAF sales for compliance rather than merit.
  • PCI Security Standards Council (2008), Information Supplement: Requirement 6.6 — the code-review-or-WAF requirement and the 30 June 2008 mandatory date.
  • Cloudflare (2022), Improving the WAF with Machine Learning — the 1-to-99 attack score, the SQLi/XSS targeting, training on rule-labeled traffic, and per-part request scoring.
  • AWS (2015), New Security Services Launched at AWS re:Invent 2015 — the October 2015 launch of AWS WAF and its attachment to CloudFront and load balancers.
  • FireMon / Jody Brazil (2024), A Practical History of the Firewall, Part 1 — the mid-1990s competition between stateful inspection, router ACLs, and proxy firewalls that preceded the application layer.

Further reading