The headless-browser tax: memory, CPU, and why HTTP clients win when they can

A working scraper built on a real browser feels like cheating the first time you run it. The page loads, the JavaScript executes, the lazy-loaded grid fills in, and your selector grabs exactly what a human would see. No reverse engineering, no XHR replay, no guessing which endpoint returns the JSON. It just works. Then you try to run a thousand of them at once and the bill arrives.

That bill is the subject of this post. A headless browser is not a function call with some overhead. It is a multi-process application that wants hundreds of megabytes of RAM before it renders a single pixel, that forks a renderer for every site it touches, that leaks memory if you keep it open and crashes if you do not give it shared memory, and that runs roughly an order of magnitude slower per page than a plain HTTP request. None of this is a defect. It is the architecture working as designed for an interactive desktop browser, repurposed as a server-side worker where every one of those design choices costs money. The question worth answering is not whether browsers are expensive. They are. It is how expensive, where the cost actually lands, and when you can avoid paying it.

What follows walks through the per-instance memory floor and why Chrome’s process model multiplies it, the CPU cost of a launch and a render, the container failure modes that bite every team exactly once, the concurrency math that vendors converge on, and the order-of-magnitude gap that makes an HTTP client the right default. The through-line: a browser is a tool you reach for when the page forces your hand, not the tool you start with.

The per-instance floor: what a browser costs before it does anything

Start with a single idle browser. A cold headless Chromium consumes somewhere between 50 and 150 MB of resident memory the moment it boots, before a page is loaded. That is the floor, not the working cost. The number people quote in practice, the one you should size a fleet against, is 300 to 500 MB per concurrent instance once it is actually rendering a typical modern page. Browserless, who have run this in production at the scale of millions of sessions, distil it to a rule of thumb that is easier to remember than any per-component breakdown: you can run roughly ten concurrent requests per gigabyte of memory.

Ten per gigabyte. Sit with that. A 16 GB box, after you leave headroom for the OS and your own orchestration code, gets you somewhere in the range of a hundred concurrent browsers if everything goes right, and “everything goes right” is doing a lot of work in that sentence. Compare the same box running an async HTTP client, where a single connection’s memory cost is measured in single-digit megabytes and the limiting factor is file descriptors and network bandwidth, not RAM. The published per-request figures put HTTP libraries at 1 to 10 MB and 100 to 500 ms per request against 50 to 100 MB and 1 to 5 seconds per page for Puppeteer. That gap is the whole story, and the rest of this post is the texture around it.

The reason a browser cannot get below that floor is that it is not one process. It is several, by design, and the design predates anyone thinking of it as a scraping primitive.

Why one browser is many processes

Chrome’s multi-process architecture is the single biggest reason a browser costs what it does, and it is worth understanding precisely rather than as folklore. The browser process is the parent: it owns the UI, the network stack, and the lifecycle of everything below it. Below it sit renderer processes, a GPU process, utility processes, and historically a plugin process. Each renderer runs Blink and V8, which is to say each renderer is a full JavaScript engine and layout engine with its own heap.

The decisive detail is how renderers get allocated. Chrome does not use one renderer per browser or one per tab. It uses a model built around the SiteInstance, and the granularity is the site, defined in the Chromium source as scheme plus eTLD+1, for example https://example.com. The process model documentation is explicit that two documents which can synchronously script each other must share a process: “Any two documents with the same principal in the same browsing context group … must live in the same process, because they have synchronous access to each other’s content.” Once Site Isolation is on, and it has been on by default for desktop Chrome since 2018, cross-site iframes get their own renderers too. A page that embeds content from five different domains can fork six processes from a single tab.

*The process count is a function of how many distinct sites the page touches, not how many tabs you open. Each accent box is a full V8 heap.

This is great for a desktop user. A crashing ad iframe takes down its own renderer and nothing else, and a malicious site cannot read another site’s memory. For that user, the memory is paying for security and stability, and the trade is worth it. For a scraper running a thousand of these, the same trade buys nothing. You do not care that the tracking pixel on a product page runs in an isolated process. You are paying for isolation you will never use.

Chrome does have a release valve. The documentation describes a “soft” process limit based on available memory, beyond which the browser starts “randomly reusing same-site processes” rather than spawning new ones, and on memory-constrained platforms it reuses aggressively even before hitting the limit. On Android it goes further: below roughly 2 GB of device RAM, Site Isolation is disabled entirely because the device cannot afford the renderers. That release valve exists because the per-process cost is real enough that Google itself caps it on hardware that cannot pay. On your server, if you let a single browser accumulate contexts and cross-site frames, you are on the wrong side of that limit and the memory chart shows it.

The CPU cost is mostly the launch

Memory is the cost people anticipate. CPU is the one that surprises them, and it concentrates in a place that is easy to miss: the launch.

Spawning a browser is the most CPU-intensive thing in the whole lifecycle. It is not the rendering, which is largely I/O-bound waiting on the network, and it is not the script execution, which is bursty but short. It is the cold start. One production write-up of running headless Chrome for screenshots over six months put Chrome startup at “at least 2-3 seconds per request” when a fresh browser was launched per job, which is the naive pattern almost everyone writes first. Browserless’s capacity guidance tells you to stagger initial session launches by five to ten seconds precisely because “browser launch is the most CPU-intensive phase,” and to space them out so workers do not all pay the launch tax at the same instant and trip their own health checks.

That detail reframes the whole concurrency problem. If launching is the expensive part, then the architecture that launches a fresh browser per page is paying the worst cost on every single request. The fix that every mature setup converges on is to amortise the launch: keep a warm pool of browsers, hand each job a fresh context rather than a fresh process, and recycle the browser after N jobs before it leaks too much. A browser context is cheap relative to a browser. It gets its own cookie jar and cache and storage partition, which is exactly the isolation a scraper actually wants between jobs, without forking a new process tree each time. Playwright leans into this with first-class browser.newContext(); Puppeteer has the equivalent and the discipline is the same. The rule that falls out of it is blunt: never launch per request, always pool. Most teams learn this after their CPU graph spends a week pinned at the launch.

There is a related CPU drain that is structural rather than per-request. Idle browsers are not free. The same production accounts describe browsers that “like to cache stuff and slowly eat more memory” while sitting open, and the cure is to bound their lifetime rather than trust them to behave. A warm pool is the right answer to launch cost, but a warm pool with no recycling policy is just a slow memory leak with extra steps.

The container tax: /dev/shm, zombies, and the flags everyone copies

Most of this runs in containers, and containers add a second layer of cost that has nothing to do with rendering and everything to do with the impedance mismatch between Chrome’s assumptions and a default Docker image.

The classic one is shared memory. Chrome uses /dev/shm, the POSIX shared-memory filesystem, heavily, for passing rendered bitmaps and other large buffers between the browser process and its renderers. A default Docker container caps /dev/shm at 64 MB. That is wildly below what Chrome wants under real rendering load, and when it fills, the renderer does not degrade gracefully. It crashes, and because a renderer crash can cascade, it can take the tab or the browser with it. This is the single most reported headless-in-Docker failure, and it has been a known Chromium issue since at least 2017.

There are two ways out and a generation of copy-pasted advice around both. You can give the container more shared memory, with --shm-size=1g on docker run or shm_size: 1gb in Compose, which keeps Chrome’s fast path. Or you can pass Chrome --disable-dev-shm-usage, which moves those buffers to /tmp instead. The flag is the more portable fix and the one that ended up in every “headless Chrome in Docker” snippet, usually riding alongside --no-sandbox and --disable-gpu. Worth knowing what you are actually trading: routing shared memory through /tmp means it lands on whatever backs /tmp, which on many images is disk rather than a tmpfs, so you have swapped a crash for slower buffer passing. On a memory-backed /tmp it is fine. On a disk-backed one under load it is a quiet tax you pay on every frame.

*The default 64 MB cap is the most common headless-in-Docker crash. Raising shm-size keeps Chrome's fast path; disabling it trades the crash for slower buffer passing when /tmp is on disk.

The second container tax is reaping. Chrome forks child processes, and if the parent dies or a job throws before it cleans up, those children get reparented to PID 1 and become zombies that nobody harvests. In a normal Linux boot, init reaps orphans. In a container, PID 1 is your app, and your app is not an init system, so the zombies pile up across runs until the box is full of dead Chrome instances holding memory. The standard fix is to run a real init as PID 1, with docker run --init wrapping the container in tini, which exists for exactly this. The same production accounts describe pods that should hold two or three Chrome processes accumulating “dozens of orphaned Chrome instances” when cleanup was missing, and report zero out-of-memory errors across six-plus months once context closing and reaping were both correct. The leak was never the rendering. It was the bookkeeping around it.

Adjacent to all of this is a Chrome version detail that quietly changed the baseline. The old headless mode, the separate headless_shell implementation that did not share Chrome’s full codebase, is gone. Chrome 132 stopped supporting --headless=old, and the legacy build now ships only as a standalone binary called chrome-headless-shell. The new headless, default since Chrome 112, is the same browser engine as headful Chrome rather than a stripped alternate. That gives you fidelity, because the thing you render is the real browser, but it also means the lighter-weight old implementation is no longer the path of least resistance. If your fleet was implicitly riding on the lean old shell, the upgrade moved your memory floor up, and the chrome-headless-shell binary is the documented way back to the lighter behaviour when you genuinely do not need full Chrome.

Cutting the render cost: blocking what you do not need

Once you have decided a page genuinely needs a browser, the next lever is how much of the page you let it render. A scraper rarely needs the images, the web fonts, the analytics beacons, or the third-party ad bundles. It needs the DOM after the page’s own JavaScript has built it. Every byte of everything else is renderer memory and CPU you are paying for and then discarding.

This is where request interception earns its keep. Both Puppeteer and Playwright let you intercept outbound requests and abort the ones you do not want, so the renderer never allocates buffers for a hero image or decodes a font you will never measure text against. The published figures put the saving high: blocking CSS, images, fonts, and media can cut renderer-process memory by 60 to 70 percent on a typical content-heavy page. That is not a micro-optimisation. It moves your ten-per-gigabyte ceiling materially upward, because the per-renderer working set is most of what the rule is counting.

The judgement call is what blocking breaks. Abort the stylesheet and a site that lazy-loads content on scroll-into-view may never trigger its loads, because the layout that drives the intersection observer never happens. Abort the images and a gallery that reads dimensions off decoded bitmaps may compute the wrong geometry. The safe default is to block the obviously inert resource classes, fonts and media and analytics, and to be careful with anything the page’s own logic reads back. You are trying to render the cheapest version of the page that still produces the DOM you need, and finding that line is per-site work that pays for itself across millions of renders.

The GPU process is a smaller version of the same idea. On a headless server with no display, GPU-accelerated compositing buys you little, and for years --disable-gpu was the standard flag to skip it, originally a workaround for headless rendering bugs rather than a pure resource play. The current guidance is narrower: the flag is genuinely needed only on Windows, and on other platforms modern Chrome no longer requires it, though a GPU process can still appear even with the flag set. The practical read for a Linux scraping fleet is that the GPU process is a minor line item next to the renderers, and the real memory is always in the V8 heaps. Optimise the renderers first; the GPU process is rounding error by comparison.

The concurrency math, and where vendors land

Put the pieces together and you can do the capacity arithmetic that every browser farm ends up doing. Memory is the binding constraint, the launch is the CPU spike, and a worker needs headroom so it never tips into the territory where the OOM killer starts harvesting your renderers. The numbers different vendors publish are strikingly consistent once you account for that.

Browserless defaults a single instance to ten concurrent sessions, configurable via CONCURRENT, with a queue (default QUEUED=10) that returns HTTP 429 when full rather than thrashing. Their private-deployment guidance models capacity as workers times per-worker concurrency, uses six sessions per worker as a worked example, and has workers reject new sessions outright once CPU or memory crosses 90 percent, returning a health-check failure instead of accepting a job they cannot serve. The ten-per-gigabyte rule and the ten-default-concurrent number are the same observation from two directions: a browser wants on the order of a hundred-ish megabytes of working set, you leave headroom, and you land near ten.

memory 1–10 MB 300–500 MB

time 0.1–0.5 s 1–5 s

per GB 100s of conns ~10 sessions *Grey is the HTTP client, orange the headless browser. The gap is one to two orders of magnitude on every axis that costs money.

The managed-platform pricing makes the tax legible in currency. In July 2025 Cloudflare put a number on its Browser Rendering product: 0.09 dollars per browser hour beyond the included allowance, plus 2.00 dollars per additional concurrent browser per month, billed on the monthly average of daily peak concurrency. The paid plan includes ten browser hours and ten concurrent browsers, with the per-account concurrency ceiling raised to 120, and a default browser timeout of 60 seconds extendable to a ten-minute keep-alive. Read that as a unit cost. A browser hour at nine cents is cheap until you multiply it by a fleet that needs to be warm continuously, and the concurrency charge is the part that bites a steady high-throughput crawl, because steady high throughput is exactly the daily-peak pattern that pricing is designed to capture. The platform is not overcharging. It is pricing the resource honestly, and the honest price of a browser is high enough that it pushes you to ask whether you needed one.

When the HTTP client wins, which is more often than it feels

The HTTP-client side of the comparison is almost embarrassingly favourable. An async client on a single core, swapping sequential requests for asyncio.gather over aiohttp, routinely turns a hundred sequential fetches that took thirty seconds into roughly one second, a ballpark thirty-fold speedup, because network latency dominates and a sequential scraper sits idle for the overwhelming majority of every second waiting on round-trips. For I/O-bound work at fifty concurrent connections, async is commonly twenty to fifty times faster than sequential. The memory cost per connection is single-digit megabytes. You can run thousands of in-flight requests on one box that would hold a few dozen browsers.

The catch, and it is the only catch that matters, is that the HTTP client gets you the bytes the server sends, not the bytes a browser would construct. If the data you want is in the initial HTML, or in a JSON endpoint the page calls, the client wins on every axis and it is not close. If the data only exists after a megabyte of JavaScript runs, mutates the DOM, and assembles the view client-side, then a raw client gets you an empty shell and you have a real decision to make. The honest first move is not to reach for the browser. It is to open the network tab and find out whether the page is fetching its data from an endpoint you could call directly. A great many “JavaScript-heavy” sites are a thin React shell over a clean JSON API, and once you find that API the browser disappears from your architecture. We have written about that hunt at length in handling JavaScript-rendered content without a browser, and the broader decision tree in parsing at scale: browser vs HTTP client. The summary is that the browser is a fallback, not a default, and the discipline of treating it that way is worth real money at scale.

There is a second reason teams reach for browsers that has nothing to do with rendering: they believe a real browser is harder to detect. Sometimes true, often not, and rarely true in the way people hope. A headless browser carries its own tells, from the HeadlessChrome user-agent token to the Chrome DevTools Protocol surface to a dozen runtime inconsistencies that fingerprinting scripts probe, and a poorly configured browser is more detectable than a well-configured HTTP client with a coherent TLS fingerprint. If your reason for paying the browser tax is anti-bot evasion rather than rendering, you are often paying for a worse outcome at a higher price; the detection surface of automation frameworks is its own deep topic, covered in Playwright vs Puppeteer vs Selenium: the detection surface and the TLS fingerprinting work. The clean mental separation is to decide rendering on rendering grounds and detection on detection grounds, and to notice that they usually point in opposite directions on cost.

A hybrid that pays the tax once

The architecture that survives contact with production is rarely all-browser or all-client. It is a tiered fetch. The crawler tries the cheap path first: an HTTP request, and a check on whether the response actually contains the target data. Only when that check fails does it escalate the URL to a browser worker, render it once, and ideally capture whatever token, cookie, or endpoint the render revealed so the next request for a similar page can go back to the cheap path. The browser becomes a fallback queue feeding off the HTTP crawler, not the front door, and the size of that queue is your real browser cost rather than the size of the whole job.

This is where the rest of the crawler’s economics fold in. A browser-rendered page that revealed a session cookie can hand that cookie to cheap HTTP requests for the rest of the session, which is the whole argument of session and cookie management across a proxy fleet and connects directly to whether you run sticky or rotating sessions. Caching and conditional requests keep you from re-rendering pages that have not changed, the subject of caching and incremental recrawl, and that matters disproportionately for browser work because a render you avoid is the single most expensive operation you can skip. The full cost model, where browser CPU sits alongside proxy spend and solve cost against your success rate, is the subject of the economics of a scraping operation. The browser line item is usually the largest, which is exactly why the tiered fetch tries so hard to keep that line short.

The tiered approach also gives you the right place to put a concurrency limiter. Browser workers are the scarce resource, so the queue feeding them is where backpressure belongs: a bounded pool, a queue that sheds load rather than thrashing, and a launch stagger so a burst does not trip every worker’s health check at once. None of that is exotic. It is the same token-bucket and backpressure machinery you would build for any rate-limited resource, applied to the most expensive worker in the system rather than to the target’s rate limit, and the mechanics carry over directly from rate limiting yourself.

What the tax actually buys

A browser is the most expensive way to fetch a page, and the cost is not incidental. It is the multi-process architecture, the per-site renderer fork, the launch spike, the shared-memory dance, the reaping discipline, and the ten-per-gigabyte ceiling that all of those add up to. Every one of those costs is the desktop browser being a good desktop browser, faithfully reproduced on a server where almost none of what they buy is useful to a crawler. You are renting a security model and a stability model and an interactivity model, and using exactly one feature out of the three: the ability to run the page’s JavaScript and read the result.

So the discipline writes itself. Pay the tax only on the pages that force you to. Find the JSON endpoint before you spawn a renderer. Pool and recycle the browsers you cannot avoid, so you pay the launch cost amortised rather than per request. Give the container its shared memory and a real init, or watch it die in the two ways every team eventually watches it die. And size the fleet against memory, because memory is what runs out first and the OOM killer does not send a warning. The teams that run browsers cheaply are not the ones with a clever flag. They are the ones who arranged their architecture so the browser almost never runs, and when it does, it runs once and hands its winnings to something cheaper. The cheapest browser render is the one you replaced with an HTTP request you only had to figure out a single time.

Sources & further reading

The Chromium Authors (2024), Process Model and Site Isolation — the canonical description of SiteInstance allocation, the site definition (scheme + eTLD+1), the soft process limit, and the ~2 GB RAM threshold below which Android disables Site Isolation.
The Chromium Authors, Multi-process Architecture — the design document for the browser/renderer/GPU process split and why each renderer is an isolated unit.
Chrome for Developers (2023), Chrome’s Headless mode — the unified new headless mode shipped in Chrome 112, sharing the full browser codebase rather than the old alternate implementation.
Chrome for Developers (2024), Removing —headless=old from Chrome — Chrome 132 drops old headless; the legacy lean build survives only as the standalone chrome-headless-shell binary.
Browserless (2018), Observations from running 2 million headless sessions — the ~10 concurrent requests per GB rule, the “unpredictable, hungry” memory behaviour, and the cascade where one crashing page takes the browser down.
Browserless, Performance and Capacity — capacity as workers × per-worker concurrency, the 90% CPU/memory rejection threshold, and the 5–10 second launch stagger because launch is the most CPU-intensive phase.
Browserless, Built-in Queueing System — default CONCURRENT=10, QUEUED=10, and the HTTP 429 returned when the queue is full.
Cloudflare (2025), Introducing pricing for the Browser Rendering API — $0.09 per browser hour — the per-browser-hour and per-concurrent-browser charges, included allowances, and the 20 August 2025 billing start.
Cloudflare, Browser Rendering limits — 3 concurrent browsers on Free, 120 on Paid, the 60-second default timeout, and the keep-alive ceiling.
Last9 (2024), How to configure Docker’s shared memory size (/dev/shm) — the 64 MB default cap, why Chrome exhausts it, and the --shm-size versus --disable-dev-shm-usage trade.
WebScraping.AI, Performance implications of using Puppeteer for web scraping — the per-page (50–100 MB, 1–5 s) versus per-request (1–10 MB, 0.1–0.5 s) comparison that frames the whole tax.
Decodo (2026), HTTPX vs Requests vs AIOHTTP — async HTTP throughput benchmarks showing the order-of-magnitude speedup over sequential fetches when latency dominates.