Skip to content

F5 Distributed Cloud Bot Defense: the architecture after the Shape acquisition

· 19 min read
Copyright: MIT
F5 XC Bot Defense wordmark with an orange telemetry arrow on a dark background

A bank that has run Shape Security since 2017 will tell you the product they bought no longer exists under that name. The script tag is still in their pages, the mobile SDK still ships in their app, and the detection still works the way it always did. But the thing on the other end of the telemetry stream is now a service inside F5’s global cloud, sold under a different name, deployed through a different set of connectors, and managed from a console that used to be a software-defined networking product. The detection engine is the same lineage. The delivery is not.

This post is about that delivery. How Shape’s client-side collection and behavioral AI became F5 Distributed Cloud Bot Defense, what the architecture looks like end to end, how the connectors wire it into a CDN or a load balancer or a mobile app without the origin ever talking to F5 directly, and where the whole thing sits in 2026 relative to Akamai, DataDome, and the rest. The detection signals themselves get a sibling post; the question here is the plumbing. How the bytes move, who sees them, and what the acquisition actually changed about the shape of the system.

The walk goes like this. First the acquisition and the rename, because the product has had three names and the confusion is real. Then the runtime: the JavaScript that collects, the VM that hides it, and the path the telemetry takes to the inference engines. Then the connector model, which is the genuinely interesting architectural decision. Then the mobile SDK, the good-bot and retooling story, and finally the 2026 positioning.

2019 to 2020: the acquisition and the three names

F5 announced the deal on 19 December 2019 and closed it on 24 January 2020. The price was about a billion dollars in cash. What F5 bought was a company that, by its own and F5’s accounting at the time, detected and blocked up to a billion fraudulent or unwanted transactions a day across the largest banks, airlines, and retailers. Shape’s co-founder and CEO Derek Smith framed F5 as “the optimum traffic flow insertion point” for Shape’s fraud detection, which is the whole strategic logic in one sentence. Shape had the detection. F5 had the place to put it: the load balancers and proxies sitting in roughly 80 percent of the Fortune 500, plus NGINX in front of a large slice of the public internet.

Shape itself came out of stealth in January 2014 with a product called ShapeShifter and a pitch built on real-time polymorphism. The founding team included people from Google and Mozilla, and the early idea was to rewrite the static elements of a page on every serving so that automation keying off fixed names and structures would break. That polymorphism work is documented in the patents, including US10382482B2, “Polymorphic obfuscation of executable code,” filed August 2016 with a 2015 priority date, which describes transcoding HTML, CSS, and JavaScript differently on each serving so “malware cannot readily learn how the transcoding is occurring.” The company moved from there into behavioral detection and the AI-heavy stack it was known for by the time F5 came calling, including the Blackfish credential-intelligence network launched in 2018.

The naming is where people get lost. The product shipped as Shape Enterprise Defense and Shape Integrated Bot Defense before the acquisition. After F5 folded it into the Distributed Cloud platform (the XC console that came out of the 2021 Volterra acquisition), the bot product became F5 Distributed Cloud Bot Defense, often written XC Bot Defense or just Bot Defense. So when you read a 2018 Shape case study, a 2021 transitional doc, and a 2026 F5 product page, you are looking at the same detection lineage under three labels. The current documentation lists “Shape Integrated Bot Defense and Shape Enterprise Defense” as the former names explicitly, which is the only reliable way to tie the histories together.

2014 ShapeShifter polymorphism 2018 Blackfish credential net Jan 2020 F5 closes $1B deal 2021 XC platform (Volterra) 2026 XC Bot Defense Shape detection lineage, three product names *The detection engine is continuous from ShapeShifter through to XC Bot Defense; what changed at each step was the delivery vehicle, not the core AI.*

How the runtime collects: the script, the VM, the telemetry path

The web side starts with a script tag. Bot Defense protects a web endpoint by injecting JavaScript that runs in the visitor’s browser, gathers signals about the request, and ships them back as telemetry. F5’s own documentation is plain about the mechanism: the system “uses JavaScript to collect telemetry from client browsers and a native Mobile SDK to collect telemetry from mobile devices,” and that telemetry is “attached in the form of HTTP headers or included in the POST body to the protected requests.” So the collected signal does not go to F5 on a side channel. It rides along on the request the user was already making, which is what lets the connector model work the way it does.

The script comes in two pieces, and the connector documentation names them by their query strings. There is a matcher config script, loaded with ?matcher, and an I/O hook script, loaded with ?single. Both are served from a configurable path that F5 calls the injection path, written into the page as something like <script type='text/javascript' src='INJECTION_PATH?matcher'></script>. The injection path is deliberately customer-specific and opaque. F5 says choosing it “prevents malicious actors from determining what system you are using to protect your application,” which is a small but telling design choice: the integration is meant to be hard to fingerprint from the outside, so a scraper cannot grep a page for a known Shape URL and immediately know what it is up against. F5 recommends placing the script right after the opening <head> so it executes early and has time to run while the rest of the page renders.

What that script does once it runs is where the Shape inheritance shows. The collected payload is not handed over in the clear. F5 says it “developed the first virtual machine (VM)-based obfuscation defense in JavaScript, employing telemetry encryption,” and the product material describes this as bytecode-level obfuscation with the VM’s opcodes randomized at frequent intervals. That is not marketing gloss. Independent reverse-engineering of the Shape VM describes a fairly standard virtual-machine structure built for hostility: an entry block that sets up state, a dispatcher loop that reads the next byte of bytecode and jumps to a handler, and a bank of handlers that do the actual work. The published analysis counts roughly 230 handlers, split between about 90 atomic instructions (bitwise operations, stack pushes, memory access) and around 140 superoperators that fuse several atomic steps into one opcode. The detail that matters for anyone trying to keep up with it: more than 80 of those handlers get reshuffled roughly every 30 minutes. The semantics stay the same; the encoding moves under you. That is the polymorphism idea from 2014, applied to the collector’s own instruction set rather than to the host page.

entry block init VM state dispatcher read byte, jump ~230 handlers ~80 reshuffled / 30 min loop encrypted telemetry header or POST body rides on the request the user was already making *The collector runs inside a JavaScript VM with a reshuffling instruction set; the signal it produces is encrypted and attached to the in-flight request, not sent on a side channel. Exact field layout of the payload is not public.*

A note on certainty, because this is where reference posts go wrong. The high-level structure (VM-based obfuscation, telemetry encryption, two script files keyed by ?matcher and ?single) comes from F5’s own documentation and product material. The internal detail (handler counts, the 30-minute reshuffle, the dispatcher mechanics) comes from independent reverse-engineering of the Shape/F5 client and matches what F5 describes at a high level, but the exact opcode table, the field names inside the encrypted payload, and the encryption key schedule are not publicly documented and shift between deployments. Anyone who tells you the precise byte layout of the current F5 telemetry blob is either reading a specific capture or guessing. The mechanism is knowable; the current literal bytes are not, by design. That is the same situation you find with Akamai’s sensor_data, and the same honest caveat applies. The kinship with the VM-and-bytecode approach in Kasada’s KPSDK is real, and the anti-instrumentation tricks that detect a patched runtime are the obvious next move once your collector lives inside a hostile browser.

The connector model: F5’s actual architectural bet

Here is the part that separates F5’s deployment from a single-vendor CDN like Cloudflare. Bot Defense does not require you to move your traffic onto F5’s network. It can, if you route through F5 Distributed Cloud WAAP, but it does not have to. Instead F5 ships a set of connectors that bolt the telemetry path onto infrastructure you already run. The solution overview lists “pre-built connectors for popular content delivery networks (CDNs), Application Delivery Controllers, application platforms, and the F5 Distributed Cloud Web App and API Protection (WAAP).” In practice the supported integrations in the current docs include BIG-IP (v14 through 16 via an iApp connector, and natively on v17.0 and later), Amazon CloudFront, Cloudflare Workers, Adobe Commerce, Salesforce Commerce Cloud, and a documented custom-integration path for anything not on the list.

What does a connector actually do? Two jobs. First, it injects the collection scripts into outbound HTML so the browser starts gathering telemetry. Second, on the inbound side, it intercepts protected requests, forwards the attached telemetry to F5’s decision service, waits for a verdict, and then enforces it. The BIG-IP v14-16 connector documentation makes the mechanics concrete. The connector routes to what F5 calls the XC Defense Engine through a configurable API hostname, with separate endpoints defined for web and for the mobile SDK. There is a timeout, defaulting to 700 milliseconds, after which the connector fails open and lets the request through unchecked rather than holding the user hostage to a slow verdict. That fail-open default is a deliberate availability choice and worth knowing if you are reasoning about the system’s behavior under load.

browser + telemetry connector CDN / BIG-IP / Workers / Lambda@Edge inject + intercept XC Defense Engine inference engines origin app / API verdict call 700ms timeout allow + header origin never talks to F5; the connector does the round trip *The connector sits in the request path, injects the collector, then forwards telemetry to the inference engines and enforces the verdict before the request reaches origin. The origin sees only a verified request plus a custom header.*

The CloudFront connector is the cleanest example of how far this goes. F5 built it on AWS Lambda@Edge and the AWS Serverless Application Repository, so the interception logic runs as edge functions inside the customer’s own CloudFront distribution and calls out to the F5 XC Bot Defense API. Cloudflare’s connector does the equivalent with Workers. The pattern is the same in every case: detection lives at the edge the customer already operates, the verdict comes from F5’s cloud, and the origin sees a clean request. When Bot Defense decides a request is human (or a sanctioned good bot), it “adds a custom HTTP request header to the request and allows the traffic to continue to the origin,” per F5’s documentation. The origin can key on that header to know the request was vetted. The mitigation choices the connector can apply are the usual three: monitor, block, or redirect, and F5 strongly recommends running everything in flag-or-continue mode during onboarding so you can measure false positives before you start blocking real customers.

This is the architectural bet, and it is a reasonable one. By decoupling detection from traffic routing, F5 sells into shops that are never going to repoint their DNS at a security vendor, which is most large enterprises with their own CDN contracts and their own BIG-IP fleet. The trade is operational surface. A Cloudflare customer gets bot management as a checkbox in a console they already use. An F5 Bot Defense customer is wiring a Lambda@Edge function or an iApp into a request path, managing an injection path, tuning a fail-open timeout, and coordinating an API-mode rollout with F5’s operations team. More control, more moving parts. For the banks and airlines that were Shape’s core market, that is the right trade. They already had the moving parts.

The control plane: Regional Edges and the inference engines

The verdict service the connectors call into is not a single box. F5 Distributed Cloud runs from a global network of Regional Edge points of presence, the same RE backbone the rest of the XC platform uses, which are F5-operated locations with a meshed private backbone between them. Bot Defense can also be deployed close to public-cloud workloads as VMs or containers when a customer wants the decision path physically near their app. The point of the distributed footprint is latency: the connector’s verdict call needs to come back inside that 700-millisecond budget, so the inference has to happen somewhere near the user, not in a single far-off region.

What runs on those nodes is a stack of detection engines rather than one model. F5’s documentation describes “bot inference engines that process telemetry to determine if a request is human or automation,” and the broader product material describes mitigation via “supervised and unsupervised ML, pattern matching, identity, and verification engines.” Below that real-time layer sits the slower loop the solution overview labels “Retooling Protection: Continuous AI-based Aggregate Data Analysis.” This is the part Shape was famous for and the part that genuinely benefits from F5’s scale. Telemetry from every protected property flows into an aggregate analysis system. When attackers retool (rebuild their automation to defeat the current detection) the aggregate analysis catches the new pattern across the network and pushes what F5’s own architecture diagram calls “Dynamic Rule Updates” back out to the inference engines. The network effect is the product. A retooling attempt observed against one bank’s login page becomes a detection improvement for every other property on the platform, which is the same collective-defense logic that HUMAN’s signal network and Cloudflare’s cross-customer model both run on.

inference engines supervised + unsup. ML real-time verdict aggregate analysis retooling detection across all properties threat-intel team human rule authoring telemetry (aggregate) dynamic rule updates fast loop: per-request slow loop: network-wide *Two loops: a per-request verdict from the inference engines, and a slower aggregate loop that watches for retooling across every protected property and pushes rule updates back down. The threat-intel team authors rules the models miss.*

The human layer is not incidental. F5 runs Threat Intelligence managed services where domain experts watch the aggregate traffic, analyze new automation, and hand-author rulesets the models alone would miss. This is the same operating model Shape ran, and it is why Bot Defense is sold with managed-service tiers rather than purely as self-service software. The detection is AI plus a staffed SOC, and for the fraud-heavy use cases (credential stuffing, account takeover, gift-card cracking) the staffed half matters because attackers in those categories are adaptive humans, not fire-and-forget scripts.

Mobile, good bots, and what the connector header carries

Web is only half the surface. Attackers move to the path of least resistance, and a site that hardens its web login pushes automation toward the mobile API, which often has weaker protection and a cleaner JSON interface. F5 addresses this with a native Mobile SDK for iOS and Android that does the equivalent of the web collector inside the app. The integration model is explicit in the docs: the app calls the SDK to generate headers, attaches them to the outbound request, and runs parseResponseHeaders() on responses to protected endpoints. The critical operational rule is single-use. The documentation states that “each set of headers contains a unique token” and warns: “Do not send the same set of headers more than once.” Replaying a captured mobile header set is exactly the attack the design is built to defeat, so each request carries fresh, single-use proof of SDK execution. F5 also asks integrators to encode the app version into the User-Agent (the docs give User-Agent: sometext MyApp/3.3 sometext as the shape) so the backend can reason about which client build produced a given telemetry set.

The good-bot side is the other thing the architecture has to get right, because a bot detector that blocks Googlebot is a business problem, not a security one. Bot Defense is built to let “good bots” through (search crawlers, partner integrations, sanctioned monitoring) while blocking the malicious automation. In 2026 this is the live edge of the whole category. AI crawlers from the large model vendors are a new class of automated traffic that some publishers want and others want gone, and “good bot” is no longer a tidy allowlist of a few search engines. F5 positions Bot Defense as distinguishing humans, good bots, and bad bots in real time, but the policy question of which AI agents count as good is increasingly the customer’s to set, not a fixed list. The verification problem (proving a bot is the crawler it claims to be, rather than a scraper spoofing the user-agent) is the same one driving Web Bot Auth and the cryptographic-signature proposals across the industry.

When a request clears, the connector marks it with a custom header before passing it to origin. That header is the integration contract between the F5 layer and the application. The application trusts requests carrying the expected vetted-header value and can treat the absence of it as a signal in its own right. The exact header names are customer-configurable and not published as fixed strings, which is consistent with the rest of the design philosophy: the integration is meant to be opaque from the outside. If you have read the DataDome or Akamai posts, the contrast is instructive. Those systems lean on named, externally observable cookies (datadome, _abck) that became community-tracked artifacts. F5’s connector model deliberately avoids handing attackers a fixed cookie or URL to key on, pushing more of the observable surface behind a per-deployment injection path and configurable headers.

2026: where it sits

The market has settled into a recognizable shape, and F5 occupies a specific corner of it. Cloudflare and DataDome win where the buyer wants bot management as a feature of an edge platform they already route through. Akamai wins inside its own enormous CDN footprint. HUMAN and Arkose lean hard on the fraud and post-login abuse cases. F5 Distributed Cloud Bot Defense wins where the buyer is a large enterprise with existing F5 infrastructure, a heavy fraud problem, and no intention of moving traffic onto a security vendor’s network. That is the Shape customer base, mostly intact: banks, airlines, large retailers. The connector model is the reason that base stayed. It let F5 keep selling Shape’s detection to organizations that would never have accepted a routing change.

What the acquisition actually changed is narrower than the rebranding suggests. The detection lineage is continuous from ShapeShifter’s polymorphism through the VM-obfuscated collector to the aggregate retooling analysis. The client-side machinery a researcher pulls apart in 2026 is recognizably the Shape VM, reshuffling its handlers on a roughly half-hour cycle, doing the same job it did before the deal. What F5 added was distribution: the Regional Edge footprint to host the inference engines near users, the connector library to bolt the telemetry path onto CDNs and load balancers without a routing change, and the BIG-IP integration that put bot defense one config object away for the customers F5 already owned. The billion dollars bought a detection engine. The engineering since bought it a delivery network.

The honest limit of any reference post on this system is the same one F5 designed in on purpose. The architecture is documented and stable: scripts keyed by ?matcher and ?single, a VM-obfuscated collector, telemetry on the request, connectors calling a 700-millisecond verdict service, two analysis loops, a single-use mobile token. The literal current bytes are not, and are not meant to be. The opcode table moves every half hour, the injection path differs per customer, and the header names are configurable strings rather than published constants. That gap between knowing how a system works and knowing what it is doing this minute is the whole point of a moving-target defense, and F5 has been building exactly that since a team of ex-Google and ex-Mozilla engineers came out of stealth in 2014 with the unfashionable idea that the defender, not the attacker, should be the one rewriting the code on every request.


Sources & further reading

Further reading