Skip to content

The history of Cloudflare: from Project Honey Pot to the edge, 2009-2026

· 22 min read
Copyright: MIT
Cloudflare wordmark with an orange edge-network motif on a dark background

A spam-tracking honeypot is a strange place for a company that now sits in front of a fifth of the web to begin. But that is where Cloudflare started. The question worth asking is how a small project to catalogue the IP addresses that harvest email addresses turned into the infrastructure layer that decides, on a large slice of the internet, whether your HTTP request is a person or a program, whether it gets challenged, and as of 2025, whether an AI crawler has to pay before it reads a page.

This post follows that line. Not as a victory lap for one company, but as a way to read the technical history of the modern web through one of the busiest reverse proxies on it. If you build crawlers, or you defend against them, the decisions Cloudflare made between 2009 and 2026 shaped the wire you are looking at right now.

Here is the route. We start with Project Honey Pot and the founding. We cover the 2010 TechCrunch Disrupt launch and the free-CDN model that made the network grow. We look at the DDoS attacks that defined the early reputation, the 2019 IPO and what the S-1 revealed about the business, Workers and the V8-isolate edge platform, then the bot-detection stack: Bot Management’s 1-99 score, Turnstile, and the cookies and fingerprints underneath. We close on the 2025 pay-per-crawl move and what it means for anyone pointing an automated client at a Cloudflare-fronted origin.

The honeypot before the company

Project Honey Pot launched in 2004. Matthew Prince and Lee Holloway built it as a distributed system for tracking the sources of email spam. The idea is simple and clever. You seed web pages with hidden, uniquely-tagged email addresses that no human would ever copy down. When spam later arrives at one of those addresses, you know which IP harvested it and when, because each address was only ever shown to one visitor. Run that across thousands of participating websites and you get a shared map of which crawlers scrape addresses and which networks send the resulting spam.

That model contains the seed of everything Cloudflare became. A network of websites contributing observations. A central system correlating bad behaviour across all of them. A reputation database that any single member benefits from because every other member feeds it. The collective-signal idea that today underpins most bot-mitigation systems was already running in a spam tracker two decades ago.

The leap from tracking threats to stopping them came from the third founder. Michelle Zatlyn, who met Prince at Harvard Business School, saw that the interesting product was not the catalogue of bad actors but a service that could act on it in real time. Cloudflare was founded on July 26, 2009 by Prince, Holloway, and Zatlyn. In April 2009 the plan won the Harvard Business School business plan competition, and the company closed its Series A in November 2009 with Venrock and Pelion Venture Partners.

Launching the orange cloud

Cloudflare went public, in the product sense, at TechCrunch Disrupt on September 27, 2010. The pitch was almost suspiciously good. Point your domain’s DNS at Cloudflare, wait about five minutes, and your site gets faster and more protected at the same time, for free. The mechanism is a reverse proxy. Cloudflare becomes the authoritative answer for your domain, so visitor traffic hits a Cloudflare data center first, which caches static assets, filters obvious abuse, and then fetches from your origin only when it has to.

The “orange cloud” was the literal UI control for this. In the Cloudflare DNS dashboard, a record with the cloud toggled orange is proxied through the network; a grey cloud is DNS-only, resolving straight to your origin. That toggle is the whole architecture in one icon. Orange means Cloudflare sits in the request path and can see, cache, challenge, or block. Grey means it just answers the DNS query and steps aside.

The free tier was the growth engine, and it was a deliberate inversion of how CDNs were sold. Akamai and the incumbents sold to large enterprises with sales teams and contracts. Cloudflare gave the core service away and made the network bigger with every free signup, because each new proxied site meant more traffic, more attack samples, and more reasons for the next site to join. By the time the company filed to go public, its platform sat in front of more than 20 million internet properties, the vast majority on the free plan. The paying customers were a thin, valuable layer on top of an enormous free base.

Visitor browser Cloudflare edge cache WAF / bot score challenge Origin server Most requests are answered from cache; the origin is only hit on a miss. The "orange cloud" puts this whole box in the request path. *The reverse-proxy path. Toggling a DNS record orange routes visitor traffic through a Cloudflare data center before it ever reaches the origin.*

The network itself is anycast. Cloudflare announces the same IP prefixes from every data center, and BGP routes each visitor to a nearby one. That is the standard CDN trick for putting a point of presence close to users without per-region addressing, and it is worth reading alongside how a CDN actually works and anycast routing if the mechanics are new to you. By early 2026 Cloudflare’s own pages cite 335 data centers in more than 125 countries.

The same network that grew on the back of free websites later turned outward to the resolver layer. In 2019 Cloudflare launched the 1.1.1.1 public DNS resolver and the WARP client, a consumer-facing service that routes a device’s traffic over the Cloudflare network. The interesting part for this history is not the privacy pitch but the symmetry. Cloudflare already sat on the server side of a fifth of the web; 1.1.1.1 and WARP put it on the client side too, resolving the names and carrying the packets of the people visiting those same sites. A company that started by watching which IPs harvested email addresses ended up able to observe both ends of a connection for a meaningful slice of internet users. That position is what makes the later bot and crawler products as strong as they are, because reputation built from one side informs decisions on the other.

The attacks that made the name

A security company earns its reputation in public, during the attacks it survives. Cloudflare’s came in March 2013, when Spamhaus, the anti-spam blocklist organisation, was hit by a DNS reflection and amplification flood after listing a hosting provider. Cloudflare published a now-famous write-up titled “The DDoS That Almost Broke the Internet” on March 27, 2013. The attack climbed from roughly 10 Gbps on March 18 to around 120 Gbps against Cloudflare, and a Tier 1 provider reported seeing more than 300 Gbps of related traffic when the attackers shifted to upstream bandwidth providers and internet exchanges including LINX, AMS-IX, and DE-CIX.

The technique was open DNS resolvers. An attacker sends a small DNS query with a spoofed source address (the victim’s), and the resolver replies with a much larger response to the victim. Multiply that across thousands of misconfigured resolvers and a modest botnet produces a flood far larger than its own upstream bandwidth. The deep mechanics are covered in DNS amplification and reflection; the point here is that Cloudflare’s anycast network absorbed it by spreading the load across many data centers rather than letting it concentrate on one.

The records have kept climbing, and the numbers from 2025 are not typos. In September 2025 Cloudflare reported autonomously mitigating a 22.2 Tbps attack that also peaked at 10.6 billion packets per second and lasted about 40 seconds. That flood traced to over 404,000 source IPs across 14-plus ASNs and was linked to the Aisuru botnet, a Mirai-lineage network of compromised IoT devices. It surpassed an 11.5 Tbps record set only weeks earlier. The lineage from 2013 to 2025 is the same story at three orders of magnitude more volume: cheap compromised devices, reflection and direct floods, absorbed by spreading the traffic thin across a large anycast surface. The history of DDoS and the Mirai botnet trace how the attack side got there.

2004 Honey Pot 2009 founded 2010 Disrupt 2013 Spamhaus 2017 Workers 2019 IPO (NET) 2022 Turnstile 2025 pay-per-crawl Two decades from a spam tracker to an AI-crawl tollbooth. *Major milestones. Dates from Cloudflare's own pages, blog, and SEC filings.*

Going public, and what the S-1 showed

Cloudflare filed its S-1 on August 15, 2019 and began trading on the NYSE under the ticker NET on September 13, 2019 at $15 per share. The filing is the clearest window into how the free-CDN bet actually paid off. For full-year 2018 the company reported $192.7 million in revenue against an $87.2 million net loss. In the first half of 2019 it booked $129.2 million in revenue with a $36.8 million net loss. At that point it had 74,873 paying customers, 408 of whom each contributed more than $100,000 in annualised billings.

The shape of that business is the thing to notice. A very large free base, a small paid base, and a handful of large enterprise accounts carrying a disproportionate share of revenue. The S-1 said the plainest version out loud: the business depends on retaining and upgrading paying customers and, to a lesser extent, converting free customers to paid. Free users were never charity. They were the sample set that trained the security products, the distribution that made the brand default, and the funnel that fed the paid tiers.

The losses are normal for this kind of company at that stage. You spend ahead of revenue to build out the network and the sales motion, and the bet is that gross margin on an already-built anycast network is high enough that scale eventually flips it. By the mid-2020s Cloudflare was posting quarterly revenue in the hundreds of millions, and in May 2026 it announced roughly 1,100 layoffs even as it reported a record quarter, the usual sign of a company optimising margins rather than chasing growth at any cost.

Workers and the programmable edge

For its first seven years Cloudflare’s edge did fixed things: cache, filter, route. In 2017 it became programmable. Cloudflare Workers launched with the blog post “Introducing Cloudflare Workers” on September 29, 2017, and the design choice in that post is the reason the product mattered.

Workers do not run your code in a container or a VM. They run it as a V8 isolate. V8 is the JavaScript engine from Chrome, and an isolate is the same lightweight sandbox Chrome uses to keep one website’s scripts from touching another’s. Cloudflare’s argument was that giving every customer a container would mean an OS process each, with the RAM and context-switching cost that implies, which does not scale to thousands of tenants per machine across a network that then served 117 locations. Node.js was rejected for the same reason its own vm module warns against running untrusted code: it was not built for tenant isolation. The Service Worker API gave Workers a browser-standard way to intercept a request and respond, including making subrequests to other origins.

The packing density is the whole point. A container per tenant means an OS process per tenant, each carrying its own memory and scheduler overhead, which caps how many customers a single machine can hold. A V8 isolate is just a fresh JavaScript context inside one shared runtime, so a single process can hold thousands of them and start a new one in milliseconds. That difference is what made a free or near-free per-request edge runtime economic in the first place.

That foundation grew into a platform. Workers KV gave eventually-consistent key-value storage at the edge, Durable Objects added single-instance coordination with strong consistency, R2 offered S3-compatible object storage without egress fees, and D1 added a SQL database. The competitive position is best read against edge compute compared, where Workers sits opposite Lambda@Edge and Fastly’s Compute platform. The strategic point is that the same network built to proxy and protect websites turned into a place to run application code, which deepened lock-in and gave Cloudflare a developer business alongside the security one.

The isolate model has a cost worth naming, because it is the same cost that matters for crawler defence. Sharing one V8 runtime across many tenants means the runtime’s behaviour is observable, and a Worker cannot pull in arbitrary native modules the way a container could. Cloudflare addressed the compatibility gap over the following years by implementing more of the Node.js APIs and the WinterCG standard surface, but the constraint that makes isolates cheap is the same constraint that makes them a sandbox. That duality, security as a side effect of how the runtime is packed, recurs across the company. The cookie that proves you passed a challenge, the score that rates your request, the isolate that runs a customer’s code: each is a boundary that exists because the network is shared, and each is something an automated client either respects or has to work around.

Reading the request: the bot-detection stack

The part of Cloudflare’s history most relevant to anyone writing an HTTP client is the bot-detection machinery, because that is the system deciding whether your request looks human. It did not arrive all at once. It accreted.

The first detection layer everyone meets is the challenge. When Cloudflare is unsure about a request it can interpose a page that runs JavaScript and, on success, issues a clearance cookie. The cookie that matters here is cf_clearance, which marks a session as having passed a challenge; the mechanics are in Cloudflare’s cf_clearance cookie and the challenge types post. Cloudflare has run several generations of this, from JavaScript challenges to interactive ones to the Managed Challenge that picks a difficulty automatically. The challenge platform’s URL path includes the recognisable /cdn-cgi/challenge-platform/ prefix, covered in inside the Cloudflare challenge platform.

Bot Management proper added a score. Launched in 2019 and detailed in a blog post on May 6, 2020, it assigns every request a bot score from 1 to 99, where 1 means near-certain automation and 99 means near-certain human. That score is exposed to customer rules as cf.bot_management.score. According to that post, the score is produced by several engines working together. A machine-learning engine using the CatBoost gradient-boosting library runs inference in under 50 microseconds and accounts for most classifications. A heuristics engine of hundreds of hand-written rules runs in under 20 microseconds and flags around 15% of global traffic as bots with a very low false-positive rate. A behavioural-analysis layer learns each customer’s normal patterns without labels. A verified-bots system distinguishes legitimate crawlers like Googlebot using reverse DNS and ASN checks. The post cites the network handling roughly 11 million requests per second on average at the time, with a hard latency budget around 100 microseconds for the whole scoring path. The full scoring breakdown lives in Cloudflare Bot Management scoring.

cf.bot_management.score 1 automated 99 human |--------- continuum ---------| fed by ML engine CatBoost, <50us heuristics rules, <20us behavioural unsupervised verified bots rDNS + ASN Numbers from Cloudflare's May 2020 Bot Management post. *The 1-99 bot score and the engines behind it. The whole path runs in microseconds because it sits on the hot request path.*

Underneath the score sit fingerprints. Cloudflare uses TLS and HTTP/2 characteristics to tell a real browser’s network stack from an HTTP library pretending to be one. The ordering of TLS extensions and cipher suites, the HTTP/2 SETTINGS frame values, and pseudo-header ordering all leak the client’s true identity even when the User-Agent header lies. This is the territory of JA3 to JA4 fingerprinting and how Cloudflare uses TLS and HTTP/2 fingerprints. The defensive takeaway, stated plainly: the exact feature weights inside Cloudflare’s models are not public, and anyone claiming otherwise is guessing. What is documented is that the signals exist and which variables they surface in firewall rules. There is also a cf_bm cookie tied to the bot-management machinery, but the precise contents and signing of these cookies are not fully published; what is known comes from observed traffic and Cloudflare’s own variable docs, not from a spec.

Turnstile and the death of the puzzle

By 2022 the interactive CAPTCHA had become a tax on everyone, humans included, and a solved problem for the better solver farms. Cloudflare’s answer was Turnstile, announced on September 28, 2022 as an open beta. It is a CAPTCHA replacement that, in most cases, asks the user to do nothing visible. Instead of a puzzle it runs a rotation of non-interactive checks: a proof-of-work, a proof-of-space, probes of browser web APIs, and tests for the small inconsistencies that separate a real browser from an automated one.

Anyone can embed Turnstile, not only Cloudflare customers. The flow ends with a token named cf-turnstile-response injected into the page, which the site’s backend then validates against Cloudflare’s siteverify endpoint, where each token can be redeemed once. Turnstile also supports Private Access Tokens, the Apple-backed scheme that lets a device attest it is genuine without Cloudflare collecting personal data. Cloudflare framed all this as the same technology behind its Managed Challenge, which it credited with cutting its own CAPTCHA use by 91%. The internals are covered in Cloudflare Turnstile internals, and the broader arc of how puzzles gave way to invisible challenges is in the history of CAPTCHA.

For a crawler operator, Turnstile changed the shape of the problem. The visible puzzle that a CAPTCHA-solving pipeline could outsource to a human farm was replaced by a silent set of environment checks that a headless browser either passes or fails on its own merits. The fight moved from solving an image to convincing the proof-of-work and API probes that the runtime is a normal browser, which is the same fight described in headless Chrome detection and JavaScript runtime fingerprinting.

The controversies that came with the power

Sitting in front of a fifth of the web is also a content-policy position, whether the company wants it or not. Cloudflare spent years insisting it was a neutral utility that should not decide what speech stays online. That position cracked in public more than once. In 2017 it stopped serving the neo-Nazi site The Daily Stormer, and Prince wrote a candid memo admitting he had woken up and decided to remove a customer from the internet, and that no one should have that power. After the August 2019 El Paso shooting it terminated service to the 8chan message board. In September 2022 it cut off Kiwi Farms, citing an immediate threat to human life after earlier resisting calls to do so.

These are not technical footnotes. They are the consequence of the architecture. When the orange cloud means Cloudflare sits in every request path for millions of sites, the decision to keep or drop a single customer becomes a decision about what the internet shows. The company’s stated preference was always to be the kind of infrastructure that does not make those calls, the way a power company does not police what you plug in. Reality kept forcing the calls anyway.

There is a human cost in the founding team too. Lee Holloway, the engineer who wrote much of the early code and co-created Project Honey Pot, was diagnosed with frontotemporal dementia, confirmed by an MRI in March 2017 that showed brain atrophy. A 2020 Wired profile traced the slow, mistaken-for-personality decline that preceded the diagnosis. He is the founder least visible in the company’s public story, and the reason is a disease with no treatment. It is worth naming him alongside the products, because the honeypot that started all of this was his idea as much as anyone’s.

Pay-per-crawl: the tollbooth for AI

The most recent turn is the one most relevant to crawlers in 2026. On July 1, 2025, a date Cloudflare branded Content Independence Day, it announced that it would block AI crawlers by default for new domains and launched a private beta of pay-per-crawl, a system that lets a site charge AI companies for each page they fetch.

The mechanism reuses HTTP plumbing rather than inventing a protocol. When a charging site is crawled, the crawler can either declare a price it is willing to pay up front or receive an HTTP 402 Payment Required response carrying the site’s price, then retry with payment intent. Cloudflare’s post names three request headers in the scheme: crawler-max-price for a crawler proactively stating its ceiling, crawler-exact-price for the reactive price in a 402, and crawler-charged to confirm what was billed. Crawler identity is verified with Ed25519 key pairs and HTTP Message Signatures, so a charging site can trust which agent it is dealing with. Cloudflare acts as the merchant of record, aggregates billing events (recorded when an authenticated request with payment intent gets a 200-level response), and settles between crawlers and publishers. Each domain gets three choices per crawler: allow free, charge a flat per-request price, or block outright.

AI crawler Cloudflare merchant of record GET /article (no payment intent) 402 Payment Required + crawler-exact-price GET /article (signed, payment intent) 200 OK + content (billing event recorded) Identity verified with Ed25519 + HTTP Message Signatures. Header names per Cloudflare's July 2025 post. *The pay-per-crawl handshake. A 402 carries the price; a signed retry with payment intent unlocks the content and records a billing event.*

Set aside whether it works as a market. The technical move is what to note. For fifteen years the web’s machine-readable access rules were advisory. A site published robots.txt and trusted crawlers to honour it, a consensus that only became the formal RFC 9309 in 2019. Pay-per-crawl swaps advice for enforcement at the proxy. The site no longer asks crawlers to behave; the edge refuses them a 200 until payment clears. That is the same shift, from polite request to hard gate, that defines every other part of Cloudflare’s history, now applied to the web scraping economy that exploded with large language models. Whether AI companies accept a tollbooth run by the network in front of a fifth of the web, or route around it, is the open question of 2026.

What the arc actually shows

Read end to end, Cloudflare’s history is one idea applied at growing scope. Collect observations across a large network, build a shared reputation from them, and then act on that reputation in the request path. Project Honey Pot collected on the spam being sent. The free CDN collected on the traffic and attacks hitting millions of sites. Bot Management and Turnstile collect on the fingerprints and behaviours of every visitor. Pay-per-crawl collects on which AI agents fetch what, and bills them. The mechanism never really changed. Only the thing being measured, and the lever being pulled, got more consequential each time.

For anyone building automated clients, the practical lesson is that the gate keeps moving down the stack. It started at the application layer with spam filters, moved to the cookie and the JavaScript challenge, dropped to the TLS and HTTP/2 fingerprint where a forged User-Agent does not help, and now sits at the payment layer where access is a billing decision. Each move made the previous evasion cheaper and the next one harder. That is the through-line, and it is why a company that began by hiding fake email addresses on web pages now decides, for a meaningful fraction of the internet, what your request is allowed to be.

The detail that captures it best is the smallest one. The whole bot-scoring path, the machine-learning inference and the heuristics and the verified-bot lookup, runs in roughly a hundred microseconds, because it has to happen on every request without anyone noticing. The decision about whether you are human is made faster than you can perceive, billions of times a day, at the edge of a network you never chose to route through.


Sources & further reading

Further reading