Skip to content

Edge compute compared: Cloudflare Workers, Lambda@Edge, and Fastly Compute

· 21 min read
Copyright: MIT
Edge compute wordmark with three runtime models: V8 isolate, container, WASM sandbox

Three vendors will sell you “code at the edge,” and all three mean something different by it. Cloudflare runs your JavaScript in a V8 isolate next to a few thousand other tenants in one process. AWS gives you a stripped-down JS engine that has to finish in under a millisecond, or a full Node container that boots somewhere in a US region and gets pulled toward the edge on demand. Fastly compiles your Rust or JavaScript to WebAssembly and spins up a fresh sandbox for every single request, then throws it away. Same marketing word, three architectures that share almost no design decisions.

The differences are not cosmetic. They decide what you can run, how fast it starts, what an escaped tenant can reach, and how much code you can ship. This is a reference for the engineer who has to pick one, or who has to reason about what an attacker controls once code is executing inside one of these runtimes. The numbers here come from vendor docs and engineering posts that are linked at the bottom, re-read rather than recalled.

The sections walk through the isolation model first, because everything else follows from it: V8 isolates, the Lambda/CloudFront split, and WASM sandboxing. Then cold starts and what the headline microsecond figures actually measure. Then the hard limits, what runs and what does not, the security surface, and where each platform places your code on the planet.

The isolation question comes first

Every serverless platform has to answer one question before anything else: when two customers’ code runs on the same machine, what keeps one from reading the other’s memory? The classic answer is a virtual machine or a container per tenant. Strong isolation, but you pay for it on every cold start, because booting a guest OS or a language runtime from zero takes time, and you pay for it in memory, because each tenant carries its own copy of the runtime.

The three platforms here reject the per-tenant VM at three different points on the spectrum. Cloudflare shares one OS process across thousands of tenants and isolates them inside V8. Fastly gives each request its own WebAssembly sandbox and destroys it immediately. AWS keeps the container model for Lambda@Edge but hides the cold start behind regional execution, and offers a second, far more restricted runtime (CloudFront Functions) for the cases where a millisecond matters. Read the rest of this post as variations on that one trade.

Three ways to keep tenants apart on shared hardware Cloudflare Workers — V8 isolates in one process 1 process, N isolates Fastly Compute — one WASM sandbox per request → ✕ born + destroyed per request Lambda@Edge — container per tenant node node full runtime per instance memory per tenant low low high blast radius if a tenant escapes the sandbox shared process one request one container *The same word, "edge compute," resolves to three different bets on how much to share between tenants and how much to throw away after each request.*

Cloudflare Workers: many tenants, one process

A V8 isolate is the unit Chrome uses to keep one browser tab from reading another tab’s heap. Cloudflare took that same primitive and applied it to multi-tenant serverless. The runtime, called workerd and open-sourced in September 2022, embeds Google’s V8 engine and runs each customer’s code inside its own isolate, which Cloudflare’s docs describe as a lightweight context that gives your code its variables and a sandbox to run in. A single runtime instance hosts hundreds or thousands of isolates and switches between them, because an isolate consumes roughly an order of magnitude less memory on startup than a Node process on a container or VM, and starts about a hundred times faster.

That density is the whole point. When you do not pay for a fresh OS and a fresh runtime per tenant, you can pack thousands of tenants onto one machine and keep them all warm. The cost is that every tenant shares one address space, separated only by V8’s own boundaries, which puts the security model under more pressure than a container ever sees.

Workers run on a single-threaded event loop, and the docs are explicit that other requests may or may not be processed while your code awaits an async task. There is no shared memory between isolates and no multi-threading exposed to user code. The fetch handler is the entry point: a request to your workers.dev subdomain or a Cloudflare-managed domain invokes fetch(), and the isolate that serves it may be one that already exists or one created on the spot.

One architectural detail from the workerd announcement is worth keeping in mind because it shapes how Workers compose. Cloudflare calls the model nanoservices: multiple Workers can run in one process, each in its own isolate, and a request from one Worker to another executes in the same thread with effectively zero added latency, closer to a function call than a network hop. The runtime achieves low baseline overhead by implementing most of its APIs in native code shared across isolates, rather than shipping a large JavaScript runtime into each one. The same announcement describes capability bindings: a Worker starts with no ambient authority and gets access to specific resources only through explicit bindings, and the global fetch() is restricted to public URLs, which closes off the server-side request forgery class by construction.

Lambda@Edge and CloudFront Functions: AWS runs two

AWS did not pick one model. It ships two edge runtimes with almost opposite trade-offs, and which one you should use depends entirely on what your code needs to touch.

Lambda@Edge is regular AWS Lambda, the same container-backed runtime, wired into CloudFront’s request lifecycle. It supports the latest Node.js and Python runtimes, can make network calls, read and replace the request body, reach other AWS services, and use up to several gigabytes of memory. The catch is in the restrictions doc: the function must be deployed in US East (N. Virginia) as a numbered version, never $LATEST or an alias, and a long list of normal Lambda features is unavailable at the edge. No VPC access, no environment variables beyond the reserved ones, no Lambda layers, no provisioned concurrency, no container images, no arm64, and no more than 512 MB of ephemeral storage. Lambda@Edge functions also do not literally run in every CloudFront edge location; they execute in the AWS region nearest the viewer, which is a different physical model from Workers or Fastly.

CloudFront Functions is the other end of the spectrum, and it is brutal about it. The runtime is JavaScript compliant with ECMAScript 5.1 plus a handful of features from later versions. It cannot access the network, the filesystem, environment variables, or timers, and it cannot read the body of the HTTP request at all. There is no Date.now()-style ambient state to lean on and no dynamic code evaluation. The function ships as at most 10 KB of source, runs in 2 MB of memory, and is measured by “compute utilization,” a 0-to-100 number where 100 means it used the entire allowed slice of time. AWS does not publish that slice as a wall-clock figure in the restrictions doc; third-party benchmarks consistently report CloudFront Functions finishing well under a millisecond, which is the design target.

The trigger surface also differs, and it is one of the cleaner ways to remember the split. CloudFront has four hooks in a request’s life: viewer request, origin request, origin response, viewer response. CloudFront Functions can only attach to the two viewer events, the ones closest to the user. Lambda@Edge can attach to all four. So a function that rewrites a URL or sets a header on the way in is a CloudFront Function; a function that needs the request body, a network call, or origin-side logic is a Lambda@Edge function.

The four CloudFront triggers and who can run where viewer origin viewer request origin request origin response viewer response Lambda@Edge CloudFront Functions *CloudFront Functions attaches only to the two viewer events; Lambda@Edge can attach to all four. The runtime you reach for is decided by which hook you need.*

Fastly Compute: a sandbox per request, then gone

Fastly went a third way. Instead of sharing a process across tenants like Cloudflare, or keeping a warm container like Lambda, it compiles your code to WebAssembly ahead of time and creates a brand-new sandbox for every request, then destroys it the moment the request finishes. One request, one instance, one memory space, no reuse.

The runtime started life as Lucet, Fastly’s own ahead-of-time WebAssembly compiler. The distinction Pat Hickey drew in March 2020 is the key one: Lucet compiles WebAssembly to native code ahead of time, so there is nothing to compile when a request arrives and the instance can start essentially immediately, whereas a just-in-time runtime produces native code in the same process as it runs. That AOT choice is what makes the per-request model affordable. Lucet was later merged into Wasmtime, the Bytecode Alliance’s runtime and the reference implementation for WASI, and Fastly’s serverless page now names Wasmtime as the runtime powering Compute.

The per-request teardown is a security argument as much as a performance one. Fastly’s product writing puts it plainly: the isolation technology creates and destroys a sandbox for each request in microseconds, so each request runs in its own self-contained memory space with only one request per instance. Because nothing survives between requests, there is no long-lived heap for a later request to read, which removes an entire class of cross-request side-channel and state-leak bugs that the shared-process model has to actively defend against. The price is that nothing survives between requests, so any in-memory cache or warmed-up state has to be rebuilt or pushed out to a separate store.

On startup speed, be careful with the headline numbers. Fastly’s own materials describe startup “in microseconds” and “100x faster than other offerings,” and the July 2021 JavaScript-support announcement frames it against the “~250+ milliseconds of startup latency” developers had lived with on container-based serverless. A specific “35.4 microseconds” figure circulates widely in third-party comparisons, but it does not appear in the Fastly pages fetched for this post, so treat the precise decimal as community lore and the order of magnitude (tens of microseconds, vendor-claimed) as the documented part. Compute supports JavaScript, Rust, Go, and C++, with Rust the most mature path because the toolchain targets WebAssembly cleanly.

Cold starts, and what the numbers actually measure

Cold start is the phrase that sells edge compute, and it is also the phrase most often misused. It means the latency to go from “no execution context exists for this code” to “this code is running.” Three platforms, three very different definitions of zero.

Cloudflare’s claim is the most quoted: a Worker isolate cold-starts in roughly 5 milliseconds. The cleverer part is the July 2020 trick that drove the effective cold start to zero for most requests. When a TLS connection opens, the client’s ClientHello carries the hostname in the SNI field before the handshake completes. Cloudflare uses that early signal to hint the runtime to eagerly load that hostname’s Worker during the handshake. Since loading a Worker takes about 5 ms and the average client-to-Cloudflare round trip is longer than that, the isolate is ready by the time the request arrives, and the observed cold start is nil. No fee, no config, every customer.

Fastly’s number lives a tier lower because WebAssembly modules are compiled ahead of time and the sandbox is tiny. Tens of microseconds, by Fastly’s account, with the sandbox built and torn down inside the request. Whether that beats a pre-warmed Worker in practice depends on the workload, but architecturally Fastly never has the “first request after a quiet period eats the boot cost” problem at all, because there is no warm instance to lose; every request pays the same small, fixed setup.

AWS is the split decision again. CloudFront Functions is built for sub-millisecond execution and effectively has no perceptible cold start, which is the trade it makes for its tiny runtime. Lambda@Edge inherits Lambda’s container cold start, which the industry has long measured in the hundreds of milliseconds to over a second for a fresh environment, and it runs in a region rather than at the literal edge. That is the gap CloudFront Functions exists to fill.

Cold start, log scale — vendor-claimed startup to first execution 10µs 100µs 1ms 100ms 1s+ Fastly Compute tens of µs CloudFront Fn sub-1ms CF Workers ~5ms (≈0 with SNI preload) Lambda@Edge 100ms–1s+ Figures are vendor claims or common third-party measurements, not a single controlled benchmark. Order of magnitude, not exact. *The spread is four orders of magnitude. It tracks exactly how much each runtime has to build before your code runs: a WASM sandbox, a tiny JS engine, a pre-warmed isolate, or a whole container.*

The hard limits decide what you can actually build

Architecture sets the ceiling; the published quotas tell you where it is. These are the numbers that turn an elegant design into a “no, that won’t fit.”

Cloudflare Workers give each isolate 128 MB of memory, covering both the JavaScript heap and any WebAssembly allocations, on both the free and paid plans. CPU time is the dimension that varies: the free plan allows 10 ms of CPU per request, while the paid plan defaults to 30 seconds and can be raised to 5 minutes. CPU time is not wall-clock time. A Worker can stay alive far longer than its CPU budget while it waits on I/O, and ctx.waitUntil() can extend work up to 30 seconds after the response is sent. Script size is 3 MB gzipped on free and 10 MB gzipped on paid, both capped at 64 MB uncompressed, and a single invocation can fan out to 50 subrequests on free or up to thousands on paid.

CloudFront Functions is the opposite of generous, by design: 10 KB of code, 2 MB of memory, sub-millisecond execution, no body access, no network. It is a header-and-URL rewriter that happens to be programmable, and the constraints are the feature, because they are what let it run in line with the cache at the viewer edge with no measurable overhead.

Lambda@Edge sits between a normal Lambda and CloudFront Functions. Viewer-triggered functions are capped at 5 seconds and 128 MB of memory; origin-triggered functions get 30 seconds and up to 3 GB. The deployment package is up to 1 MB for viewer triggers and 50 MB for origin triggers. Request-body access has its own truncation rules worth memorising if you rely on it: the body is base64-encoded before exposure, and it is truncated at 40 KB for viewer requests and 1 MB for origin requests. A function that replaces the body faces the same ceilings, raised to 53.2 KB and 1.33 MB respectively when the replacement is base64-encoded, and exceeding them returns an HTTP 502.

Fastly Compute’s published limits are framed differently because the model is different. The constraint that matters most is the per-request lifecycle: state does not persist between requests by default, so the engineering limit is less “how much memory per invocation” and more “what survives,” which for Compute is nothing in the sandbox itself. Persistent data goes to Fastly’s KV and config stores, not to a long-lived heap.

Published limits at a glance Workers CF Fn Lambda@Edge Fastly runtime V8 isolate ES5.1 JS Node/Python WASM memory 128 MB 2 MB 128 MB–3 GB per-request CPU / time 10ms–5min CPU sub-1ms 5s / 30s microsecond boot code size 3–10 MB gz 10 KB 1 MB / 50 MB WASM module network / body yes no yes yes runs at edge POP edge POP nearest region edge POP *Read this top to bottom for any one column and you have its job description. Workers and Fastly run general code at the POP; CloudFront Functions is a rewriter; Lambda@Edge is a full runtime that runs in a region.*

What actually runs at the edge

The runtime decides the programming model, and the programming model decides which problems are a natural fit.

Workers run general-purpose JavaScript and WebAssembly with network access, so the kind of thing people build on them is API gateways, auth and token validation, A/B routing, request coalescing, and full applications that talk to Cloudflare’s own storage primitives like KV, R2, D1, and Durable Objects. The 128 MB and the multi-minute CPU budget on the paid plan are enough to run real application logic, not just request munging. Smart Placement, shipped in May 2023 and stabilised in 2025, even inverts the usual edge assumption: if a Worker makes more than one subrequest to a backend on average, Cloudflare will run it in the data center closest to that backend rather than closest to the user, because for chatty backend traffic the round trips dominate. That is an admission worth sitting with. Sometimes the fastest place for “edge” code is not the edge.

Fastly Compute runs compiled code, Rust most maturely, and shines where you want near-native compute per request with hard isolation: heavy header and routing logic, request normalisation, image and content transformation, and security filtering in front of an origin. The per-request sandbox makes it a comfortable fit for untrusted or security-sensitive transforms, since nothing leaks from one request into the next.

CloudFront Functions is for the high-volume, dirt-cheap, must-be-instant rewrites: normalising a URL, adding a security header, doing a simple redirect, a quick cache-key tweak. It runs on every request at the viewer edge and is priced and built to be invisible. Lambda@Edge is what you reach for when you need the request body, a call to DynamoDB or S3, or full Node libraries, and you can tolerate the regional execution and the container cold start. The trade is explicit: power and ecosystem in exchange for latency and edge proximity.

If you are reasoning about these platforms from the outside, as a client rather than an operator, the runtime model also shapes what you observe. A Worker doing TLS or HTTP/2 fingerprinting at the POP, a CloudFront Function rewriting a CDN cache key before the cache lookup, and a TLS-terminating proxy deciding where the handshake ends are all “edge compute” in the same breath, and they sit at different points in the request’s life. The placement question, who runs where, is the same one anycast routing answers at the network layer, one level below the runtime.

Security: the shared process pays the most attention

The more you share between tenants, the harder the isolation has to work, and nowhere is that clearer than in how each platform treats Spectre.

Cloudflare’s security model doc is unusually candid: the V8 team at Google has said V8 itself cannot defend against Spectre, so isolates alone are not enough. Cloudflare layers defenses on top. The process is sandboxed with Linux namespaces and seccomp to block all filesystem and network access. There is no shared memory and no multi-threading exposed to user code. High-precision timing is denied: Date.now() returns the time of the last I/O and does not advance during execution, which removes the fine-grained clock a Spectre gadget needs to read out leaked bits. The runtime watches CPU performance counters for anomalies and will dynamically reschedule a suspicious Worker into its own process, and Workers that attach a debugger are moved to a separate process as a matter of course. Cloudflare frames the overall approach as cascading slow-downs: stack enough independent obstacles that an attack becomes too slow to be worth running. The honesty is the point. Sharing one process across thousands of tenants is the densest model, and it is the one that has to argue hardest that it is safe.

Fastly’s argument is structural rather than defensive. One request per instance, and the instance is destroyed when the request ends, so there is no persistent shared state for a side channel to read and no neighbour sitting in the same memory space for the duration. The company describes this as removing an entire class of side-channel attacks and shrinking the attack surface. The trade is the one already noted: you give up warm in-process state to get there.

AWS leans on isolation models it already runs at scale. Lambda@Edge inherits Lambda’s per-function execution environment, the Firecracker microVM boundary that AWS uses across Lambda, which is a heavier and more conventional separation than an in-process isolate. CloudFront Functions takes the Fastly-like route of removing capability: no network, no filesystem, no timers, no body, no dynamic evaluation, so there is very little for hostile code to do even if it wanted to. The smallest runtime has the smallest attack surface almost by accident.

Where the code physically runs

Proximity to the user is half the pitch of edge compute, and the three vendors do not have the same map.

Cloudflare and Fastly both run your code in the POP, so geography is mostly a question of how many POPs exist. Cloudflare’s network spans well over 300 cities, the larger footprint of the two. Fastly runs fewer, denser POPs, strong where it has coverage and thinner in regions like parts of APAC where its presence is lighter. CloudFront Functions also runs at CloudFront’s edge locations, of which AWS has a very large number, since they were built for caching first.

Lambda@Edge is the exception that catches people out. Despite the name, it does not run in every edge location. It executes in the AWS region nearest the viewer, and the function itself must be authored in US East (N. Virginia) before CloudFront replicates it out to the regions. So “Lambda@Edge” is closer to “Lambda in a nearby region, triggered by CloudFront” than to “Lambda in the POP.” For a viewer in a city with a CloudFront edge but no nearby compute region, the gap between a Worker’s POP execution and a Lambda@Edge regional execution can be real.

Picking the right tool, and what the split tells you

If you want one heuristic, it is the same question the whole post turns on: how much does your code need to touch, and how fast does it need to start. Code that only rewrites headers and URLs and must be effectively free belongs in CloudFront Functions, and its absurd 10 KB and 2 MB limits are the proof that AWS meant it for exactly that. General application logic that talks to storage and wants to start instantly fits Cloudflare Workers, whose 128 MB isolate and pre-warmed cold start cover most of it. Compute-heavy, security-sensitive, compiled work with hard per-request isolation is Fastly’s lane. And anything that genuinely needs Node libraries, the request body, and calls into the AWS ecosystem is Lambda@Edge, with the regional-execution caveat priced in.

The deeper thing the comparison shows is that “edge compute” stopped being one idea around the time these three platforms diverged. Cloudflare bet that a browser isolation primitive scales to multi-tenant serverless and spent years hardening it against Spectre to make the bet hold. Fastly bet that WebAssembly’s per-request sandbox is both fast enough and safe enough to throw away on every request. AWS, holding the largest CDN, declined to bet at all and shipped two runtimes at opposite extremes so it never had to. Three different answers to the same question about how much to share, and the cold-start numbers, the memory limits, and the Spectre defenses all fall out of that one choice. When a vendor quotes you a startup time, the useful follow-up is not “how fast” but “what did you have to build to get there, and who else is in the room while it runs.”


Sources & further reading

Further reading