Skip to content

Browser-based proof-of-work: how Anubis gates crawlers with hash puzzles

· 23 min read
Copyright: MIT
The word ANUBIS as a monospace wordmark with a SHA-256 leading-zero hash string highlighted in orange

In early 2025 a wave of FOSS infrastructure started greeting visitors with an anime jackal and a spinning progress bar that said it was “making sure you’re not a bot.” GNOME’s GitLab did it. The Linux kernel mailing list archive did it. FFmpeg, Wine, the Arch wiki, UNESCO, Duke University’s digital archives. The common thread was a small Go reverse proxy called Anubis, and the thing it makes your browser do before it serves the page is the oldest anti-abuse trick there is: solve a hash puzzle. Find a number that, appended to a server-issued challenge, makes the SHA-256 of the whole thing start with a run of zeros. A real browser does it in about a second. Then it gets a cookie and stops being bothered.

The question worth being precise about is what that actually buys you. Anubis was not built to stop a targeted attacker. It was built because AI training crawlers were melting open-source git servers, ignoring robots.txt, rotating through residential IP ranges, and hammering every diff view and blame page on the site. The puzzle is aimed squarely at that workload. This post is a single-tool deep dive into Anubis: how the challenge is constructed and verified, why a string in the User-Agent is the whole gate, what the JWT cookie carries, how the design changed between the deterministic v1 approach and the server-stored challenge it issues now, and the sharp, well-documented limits that mean it slows scrapers without walling them out. For the broader case of proof-of-work as an anti-bot primitive across Kasada, hCaptcha, and mCaptcha, see the companion post on the proof-of-work renaissance; here the lens stays on Anubis.

Where it came from

Anubis exists because of one specific bad week. Xe Iaso, the developer who wrote it, runs a self-hosted Gitea instance, and in late 2024 an Amazon crawler started pulling it apart, working around the robots.txt rules meant to keep automated clients off the expensive endpoints. Git web frontends are a worst case for crawlers: every commit has a diff page, every file has a blame view, every branch multiplies the link graph, and rendering each one costs real CPU on the origin. A crawler that follows every link does the equivalent of a denial-of-service without meaning to. Iaso’s reaction was to write a gateway that makes the crawler pay something before it gets through, released publicly on 19 January 2025 under the MIT license. The project is now developed under the Techaro banner, with most commits still authored by Iaso.

The name is the joke that became the brand. Anubis is the Egyptian god who weighs the souls of the dead against a feather; the project “weighs the soul of incoming HTTP requests,” and a request that fails the weighing does not pass into the afterlife of your origin server. The mascot is an anime jackal-girl drawn in the project’s house style, and that mascot turned into a minor adoption story of its own. Duke University ran a pilot and reported that the interface, “particularly its anime girl mascot image,” did not match the professional styling they wanted, and that Anubis offered no convenient way to swap it out; they patched it locally, and the project’s maintainer offers rebranding as a paid feature. The detail is funny but it points at something real about the project’s posture: this is FOSS infrastructure software with a single dominant author and a small commercial arm bolted to the side, not a polished enterprise product.

What it competes with is the entire commercial anti-bot industry, the Cloudflare and DataDome and Akamai tier that most of these projects cannot afford or do not want. Anubis is what you reach for when you run a git server on a budget, you are getting hammered by crawlers, and you would rather inconvenience a fraction of your human visitors than route all of your traffic through a third party. That trade-off is the whole pitch, and whether it is a good one depends entirely on the details of how the gate works.

The gate is a string in the User-Agent

Before any hashing happens, Anubis has to decide whether to challenge a request at all, and the default rule for that is almost comically blunt. It challenges any request whose User-Agent header contains the substring "Mozilla". That is the gate.

It sounds too crude to work, and it half is, but the logic behind it is sound for the specific threat. Nearly every modern browser sends a User-Agent that begins with Mozilla/5.0, a fossil from the 1990s browser wars that every engine has carried forward for compatibility. Crawlers that want to look like browsers, which is most of them, copy that string verbatim so they blend into normal traffic. Meanwhile the well-behaved automated clients you actually want to let through, like git’s own HTTP client, RSS readers, and package managers, generally do not put Mozilla in their User-Agent. So the heuristic splits traffic into “claims to be a browser” and “doesn’t,” and only the first group gets a puzzle. A crawler can dodge the challenge entirely by dropping Mozilla from its User-Agent, but then it is announcing itself as a bot to every other piece of logging and rate-limiting on the site, which is exactly the disclosure it was trying to avoid.

On top of the User-Agent test, the default policy carves out a set of low-harm paths so it does not break the open web’s plumbing. Requests to /.well-known, /robots.txt, and /favicon.ico skip the challenge, as do paths that look like feeds, ending in .rss, .xml, or .atom. The reasoning is that you want crawlers to be able to read your robots.txt (the whole point is to be a good citizen about the rules) and you do not want to break feed readers that legitimately poll for updates.

Anubis challenge-presentation logic (default policy) Incoming request UA contains "Mozilla"? no -> pass to backend exempt path? robots.txt / .well-known / feeds valid cookie + signature? yes -> pass (maybe re-screen) Present PoW challenge browser must solve to continue No on any check above routes around the puzzle; only browser-shaped, cookie- less requests reach the puzzle. *The gate is mostly the Mozilla heuristic and a handful of path exemptions. Only a request that claims to be a browser, lacks a valid cookie, and is not on an exempt path actually gets a puzzle.*

This bluntness is the most criticized part of the design and the most defended. Critics point out it both over- and under-blocks: a privacy-minded human with JavaScript disabled gets walled out, while a crawler that simply stops claiming to be a browser sails through. Defenders point out that the goal was never a perfect classifier. It was to raise the cost of the dominant, lazy crawler pattern, which is exactly “spoof a browser UA, follow every link, run no JavaScript,” and against that specific pattern a Mozilla check plus a JavaScript-gated puzzle is brutally effective at almost zero configuration cost.

Constructing the challenge

Once Anubis decides to present a challenge, it generates one and stores it. The implementation here changed in an important way between the original 2025 design and what the current code does, and it is worth being precise about both because a lot of secondhand writeups describe only the first.

The original design was deterministic and stateless. The challenge string was a SHA-256 over a bundle of request metadata: the Accept-Encoding, Accept-Language, X-Real-Ip, and User-Agent headers, the current UTC time rounded to the nearest week, and a fingerprint of the server’s Ed25519 private key. Because every input was either present in the request or known to the server, the server never had to remember anything; it could recompute the expected challenge for any incoming request and check the client’s answer against it. The week-rounded timestamp gave the challenge a natural one-week lifetime, and binding the key fingerprint in meant challenges from one Anubis instance were not valid against another.

The current code took a different path. Rather than deriving the challenge deterministically from headers, Anubis now generates a challenge object with its own identity and keeps it server-side for a short window. Reading the source, each new challenge gets a UUIDv7 ID, a Method naming the challenge algorithm, 64 bytes of random data hex-encoded into a RandomData field, an IssuedAt timestamp, the Difficulty, a hash of the matched policy rule, and a small Metadata map carrying the User-Agent and X-Real-Ip of the requester. That object is written into Anubis’s store keyed by challenge:<id> with a thirty-minute expiry. The data the browser actually hashes against is the challenge’s random string, not a digest of the request headers. The shift from “recompute the challenge from the request” to “issue a random challenge and remember it” trades a little server-side state for a cleaner separation between issuing and validating, and it is the kind of detail that secondhand explanations of Anubis usually miss because they are working from the original blog post rather than the current tree.

v1 (2025): deterministic, stateless SHA-256( Accept-Encoding, Accept-Language, X-Real-Ip, User-Agent, UTC time rounded / week, Ed25519 key fingerprint ) Server recomputes; stores nothing. current: server-stored object ID: uuidv7 Method: algorithm name RandomData: 64 rand bytes IssuedAt, Difficulty PolicyRuleHash Metadata: UA, X-Real-Ip Stored at challenge:<id> for 30 minutes. *The challenge construction changed between the original deterministic design and the current code. Most writeups describe only the left box.*

The browser receives the challenge, the difficulty, and a small bundle of JavaScript. From there the work moves entirely to the client.

Solving it in the browser

The puzzle the browser solves is a partial hash preimage, the same shape as Adam Back’s Hashcash from 1997. Take the challenge string, append a nonce, compute the SHA-256, and check whether the digest starts with the required number of zeros. If not, increment the nonce and try again. There is no shortcut; a cryptographic hash gives you no way to run backward from “I want a digest starting with five zeros” to the nonce that produces it, so the only strategy is to keep guessing. Each guess has a fixed, small probability of landing, and on average you do work proportional to two raised to the difficulty before one does.

The detail that trips people up is what “difficulty” counts. In Anubis it is leading zero hex digits (nibbles), not zero bits and not zero bytes. The worker reads this off the raw hash bytes directly. It computes requiredZeroBytes = Math.floor(difficulty / 2) full zero bytes, and if the difficulty is odd it additionally requires the high nibble of the next byte to be zero. So difficulty 5 means two full zero bytes plus a zero high-nibble in the third, which is five leading zeros in the hex string. Each added difficulty level multiplies the expected work by sixteen, so the cost curve is steep: difficulty 4 is around 65 thousand expected hashes, difficulty 5 around a million, difficulty 6 around sixteen million, which is where the published “several minutes on a typical browser” figure comes from.

There is a genuine documentation conflict on the default difficulty that is worth flagging rather than papering over. The design doc and the client-side JavaScript both default to 5 leading zeros. The Go source defines DefaultDifficulty = 4. Earlier descriptions cite 5. The honest read is that the effective default has drifted between 4 and 5 across versions and configuration layers, so treat any single number as version-specific. Operators set difficulty in policy regardless, and the practical range the project documents runs from 1 (near-instant) to 6 (minutes).

Expected SHA-256 attempts vs difficulty (leading hex zeros) ~4k 3 ~65k 4 ~1M 5 ~16M 6 difficulty (orange = the default the design doc and JS use) *Each difficulty level multiplies expected work by sixteen. The steepness is the whole reason a difficulty bump is the operator's first lever when bots adapt.*

The solving code is built to keep this from freezing the page. Anubis spawns a pool of Web Workers, defaulting to half the machine’s logical cores (navigator.hardwareConcurrency / 2, floored to at least one), so the hashing runs off the main thread and the UI keeps animating. Each worker strides through the nonce space by its thread count so they do not redo each other’s work, one worker starting at nonce 0, the next at 1, and so on, all incrementing by the number of threads. For the hashing itself it prefers the native crypto.subtle.digest("SHA-256", ...) from the Web Crypto API when the page is a secure context, and falls back to a pure-JavaScript SHA-256 otherwise. There is a specific carve-out: on Firefox and the Goanna-engine forks it uses the pure-JS implementation regardless, because the overhead of the async Web Crypto promise machinery on those engines made the native path slower in practice for this tight loop. When a worker finds a hash with enough leading zeros it posts the winning hash and nonce back, the others are terminated, and the client submits the answer.

The requirement to run all of this is the actual filter. The challenge needs ES6 modules, Web Workers, the Fetch API, and Web Crypto, the modern-browser baseline. A crawler that is just an HTTP client with a browser-shaped User-Agent has none of that. It fetches the challenge page, sees some script and a progress bar, and has no JavaScript engine to run them, so it never produces an answer and never gets the cookie. That is the real mechanism. The proof-of-work math is almost beside the point; the gate is “can you execute a modern browser’s worth of JavaScript,” and the hashing is what makes executing it cost something. The cross-link worth following here is the broader question of JavaScript-rendered content without a browser, because Anubis is essentially a bet that a large class of scrapers will not pay the headless-browser tax to clear it.

When the client submits a valid solution, Anubis re-checks it server-side, and on success it issues an HTTP cookie named techaro.lol-anubis-auth containing a signed JSON Web Token. (The source also defines a base cookie name techaro.lol-anubis and a separate techaro.lol-anubis-cookie-verification cookie used to confirm the client can store cookies at all; the auth token the browser carries afterward is the -auth one.) The JWT carries a small set of claims: challenge, the challenge string the answer was derived from; nonce, the winning iteration number; response, the hash that passed; iat, when the token was issued; nbf, a “not before” set one minute prior to issuance; and exp, the expiry, one week out by default. The token holds enough to let the server re-verify everything from the signature alone, without trusting the client’s word for any of it.

The signature is Ed25519. Anubis generates a fresh Ed25519 keypair on startup and signs every JWT with it, which is clean and fast but has a sharp operational edge worth knowing: there is no built-in way to share that keypair across instances, so a horizontally scaled deployment behind a load balancer will hand out tokens that only validate on the node that issued them, unless you pin sessions or configure a shared secret. The maintainers note this as a known limit slated for future work. The nbf backdating by a minute is a small but real touch, papering over clock skew between the issuing server and a validating replica so a token does not get rejected as “from the future.”

Cookie: techaro.lol-anubis-auth = <signed JWT> challenge string the answer derives from nonce winning iteration number response hash that passed the check iat issued-at timestamp nbf issued-at minus 1 min exp issued-at plus 1 week Signed with an Ed25519 keypair generated fresh on each server start. No built-in keypair sharing across instances yet, which complicates horizontal scaling. *The JWT carries everything needed to re-verify the solution from the signature alone. The week-long expiry is why a human solves the puzzle roughly once a week, not once a page.*

The week-long expiry is what keeps the human cost tolerable. You pay the puzzle once, get a token good for a week, and browse normally until it lapses. The project also re-screens probabilistically: even with a valid cookie, a configurable fraction of requests gets sent back through proof-of-work, so a stolen or replayed token does not buy unlimited free passage. That random re-challenge is the cookie-economy analogue of what commercial vendors do with rotating tokens; the DataDome cookie lifecycle post covers the far more elaborate version of the same idea, where token issuance, rotation, and validation are a whole subsystem rather than a one-week JWT.

More than one kind of challenge

Calling Anubis “the proof-of-work proxy” is now a slight simplification, because proof-of-work is one of several challenge methods it can serve, selected per policy rule. The Method field in the challenge object names which one.

The default and best-known is the proof-of-work challenge described above. Alongside it the project added a metarefresh challenge, introduced around version 1.20, for administrators who did not want to wall out users with JavaScript disabled. Instead of hashing, it uses an HTML <meta http-equiv="refresh"> tag to bounce the browser to a pass endpoint after a short delay, no script required. It is weaker (anything that follows a meta-refresh clears it) and the project ships it disabled by default, but it exists for operators who would rather lose some bot-stopping power than break no-JS browsers. There is also a Preact-rendered interactive path. The point is that the architecture became a small challenge framework with proof-of-work as the flagship rather than the only option.

The same version that brought metarefresh also brought a weighting system, which is the part of Anubis that has quietly moved it toward looking like a lightweight WAF. Instead of a binary challenge-or-not decision, a request accumulates a weight from policy rules: a User-Agent containing bot adds weight, a trusted session cookie subtracts it, and the final score decides whether the request passes freely, gets a standard challenge, or gets a harder one. That scoring posture is the same shape, at a much smaller scale, as what commercial bot managers do; the Cloudflare Bot Management 1-to-99 score is the heavyweight version of “turn a pile of signals into one number and threshold it.” Anubis’s maintainer has been explicit that the longer-term goal is to grow it into “a web application firewall that can potentially survive the AI bubble bursting,” and has floated adding fingerprinting signals like JA4 TLS fingerprints and a custom HTTP request fingerprint. If those land, the proof-of-work puzzle becomes one signal among several rather than the entire gate, which is the direction every serious anti-bot system eventually travels.

What it actually stops, and what it doesn’t

The honest assessment of Anubis is that it works well against exactly the threat it was built for and poorly against anything that bothers to adapt, and the project’s own users have lived out both halves of that in public.

The success case is real and large. Anubis stops the lazy, high-volume crawler: an HTTP client that spoofs a browser User-Agent, follows every link, and runs no JavaScript. Against that pattern it is close to total, because the crawler cannot execute the challenge script and never gets past the gate. For a FOSS git server being flattened by exactly that traffic, dropping Anubis in front cut the load dramatically and at almost no configuration cost, which is why adoption spread so fast across GNOME, the kernel list archive, FFmpeg, Wine, and the rest. The win is not that the puzzle is hard. The win is that running a real browser per request is expensive, and most crawlers were not willing to pay it.

The failure case arrived on schedule. In August 2025 Codeberg, the FOSS code-hosting cooperative, reported that AI scrapers had “learned how to solve the Anubis challenges” and that the resulting traffic produced “a period of extreme slowness,” some of it from networks Codeberg attributed to Huawei. Once a scraper is willing to run a headless browser, or better, a purpose-built native solver, the puzzle stops being a wall and becomes a small tax. Codeberg did not rip Anubis out; they kept it and started layering other tools like Iocaine and go-away on top, which is the realistic posture. The gate still filters the cheap bots even after the expensive ones learn to pass.

The sharpest technical critique came from security researcher Tavis Ormandy, and it is worth taking seriously because the math is on his side. His argument is that the proof-of-work cost to an attacker is negligible: a native solver in a couple dozen lines of C clears the challenge faster than the JavaScript a human’s browser is forced to run, and the aggregate compute cost stays trivial for an AI company until millions of sites deploy Anubis, well into “a single cent per month” territory against budgets in the eight figures. He demonstrated solving the challenges for roughly eleven thousand Anubis deployments in about six minutes on a free Google Cloud VM. The proof-of-work, viewed strictly as a cost imposed on a determined attacker, does not hold up, and writing a native solver is precisely the kind of work that is easy once you understand the loop, which is why this post describes the mechanism and not a turnkey solver.

The rebuttal does not dispute the math. It disputes what the math is measuring. The defenders’ point is that the puzzle was never the load-bearing wall; the JavaScript-execution requirement is. The cost that actually matters to a scraper is not the hashing, it is being forced to run a full browser engine to clear the gate at all, the same headless-browser tax that makes browser-based crawling an order of magnitude more expensive than firing raw HTTP. Against the crawlers that refuse to pay that tax, which was most of them when Anubis shipped, the puzzle works precisely because it is cheap for the defender to demand and impossible to fake without a JavaScript engine. Ormandy is right that a motivated attacker walks through it. The maintainers are right that most attackers in early 2025 were not motivated, just voracious, and a gate that filters the voracious-but-lazy majority is worth deploying even though it does not stop the determined few. Both things are true, and the disagreement is really about what problem you think you are solving.

There is also a cost on the defender’s own side that the critique highlights and that is genuinely uncomfortable. The puzzle does not just tax bots; it taxes humans, and it taxes them unevenly. A visitor on a recent laptop never notices the second of hashing. A visitor on an old phone, a low-end device, or a battery-conscious browser can wait noticeably longer and burn measurable power doing arithmetic whose only purpose is to prove they are not a robot. The Free Software Foundation went as far as arguing that shipping mandatory proof-of-work amounts to “pressuring users into running malware,” a contested claim given the solver code is itself free software, but the underlying discomfort is fair. Anubis externalizes a small, regressive compute cost onto every human visitor, heaviest on the people with the weakest hardware, to deter bots that can shrug the same cost off entirely. That asymmetry runs the wrong way, and it is the strongest argument against treating proof-of-work as anything more than a stopgap.

What Anubis is, in the end

Anubis is a load-shedding tool that wears the costume of an authentication challenge. Strip away the jackal and the soul-weighing and the hashing, and the thing it does is refuse to render an expensive page for any client that will not run a modern browser’s worth of JavaScript first. The proof-of-work is the toll booth that makes that refusal cost the client something; the real barrier is the requirement to show up in a real browser at all. Read that way, the Ormandy critique and the maintainers’ defense stop contradicting each other. The hashing is weak as a cost function and strong as a liveness test, and Anubis was always relying on the second property more than the first.

What it bought the FOSS world was time, and time at a price almost any project could afford. For a stretch of 2025 a few hundred lines of Go and a clever User-Agent check did what a year earlier had seemed to require a commercial contract with an anti-bot vendor, and it did it for the exact victims, like underfunded open-source infrastructure, who could least afford that contract. That the wall is now being climbed does not erase the months it held, and the project’s drift toward weighting, fingerprinting, and a WAF-shaped future suggests its author understood from the start that a single hash puzzle was never going to be the whole story. The most telling number in the whole saga is the one Codeberg reported: the bots did not give up when they hit Anubis. They learned to solve it. A defense that forces your adversary to spend engineering effort writing a solver has already changed the economics in your favor, even when the solver eventually ships.


Sources & further reading

Further reading