Arkose Labs FunCaptcha internals: the game challenge and risk-based enforcement

Most CAPTCHAs ask you to prove you are human by doing something a computer is supposed to be bad at: reading warped text, picking out the traffic lights, transcribing a smear of digits. FunCaptcha asks you to rotate an animal until it stands upright, or to pick the dice that add up to a number, or to slide a puzzle piece into a gap. The puzzle is not the interesting part. The interesting part is that by the time you see it, the system has already decided how hard your puzzle should be, how many rounds you will play, and whether you were ever going to be allowed through at all.

That decision is the whole product. FunCaptcha (the Arkose Labs challenge, now sold under the name Arkose MatchKey) is a risk engine with a game bolted to the front of it. The game exists to be expensive for whoever the engine has already flagged. This post walks through how that works: why Arkose chose interactive 3D games over text, what the encrypted fingerprint payload collects before the first frame renders, the endpoint flow from session setup to server-side verification, and the per-session proof-of-work trick that keeps a solved answer from being replayed. Where the internal layout is not publicly documented, I will say so rather than guess.

Why games instead of text

The case against text CAPTCHA was settled by machines, not by Arkose. By the mid-2010s, generic solvers were reading wide distributions of distorted-text designs at once, and Google’s own digit recogniser was reading the hardest reCAPTCHA text variant at around 99 percent accuracy. Text recognition is a solved computer-vision problem. A challenge whose difficulty rests on optical character recognition is a challenge that an off-the-shelf model clears, which means the only people it inconveniences are humans with bad eyesight and slow connections.

Arkose Labs started life in 2013 as FunCaptcha, out of a Brisbane Startup Weekend, founded by Kevin Gosschalk and Matthew Ford. Gosschalk’s background was in interactive machine-vision and game development, and the original pitch was narrow: replace ugly text boxes with small image puzzles people did not mind doing. The company later rebuilt that idea into a risk platform, but the design instinct carried through. The challenge is a game because a game gives the defender control over three things text never did.

First, the answer space is parameterised. A text CAPTCHA has one correct string. A rotation game has a correct angle, an axis, an object, a background, a number of distractors, and a number of rounds, and the defender can dial each of those independently. Second, the game is interactive over time, so it produces a stream of input events (cursor paths, click timing, drag velocity, the rhythm of the arrow presses) rather than a single final answer. That stream is itself a signal. Third, and this is the part that matters most, a 3D rendered object with novel art is a fresh classification problem on every variant. A solver trained on “rotate the owl” does not transfer for free to “rotate the seahorse made of clouds.” The defender can mint new art faster than an attacker can label and train against it.

Arkose leans hard on that last point. The current MatchKey game type is described by the company as a single challenge format with over 1,250 variants, with new ones added on a rolling basis. The number itself is a marketing figure, but the mechanism behind it is real: if every variant is a distinct visual task, then a solver has to generalise across the whole distribution rather than overfit one design, and the cost of staying current is continuous rather than one-time.

*The defender can vary every dimension of a game challenge independently, and each new art variant is a fresh classification task. A text CAPTCHA has one axis and OCR closed it.*

What the system measures before you see a puzzle

Open a page protected by Arkose and the first thing that happens is not a challenge. It is a fingerprint. The client script loads from a per-customer host of the form <company>-api.arkoselabs.com/v2/<public-key>/api.js, and the integration splits cleanly into two components: detection, which runs invisibly and produces a token, and enforcement, which renders the visual challenge in a modal when detection decides one is warranted. Many sessions only ever touch the detection path. They get a token without seeing a game at all, which is the point. The challenge is the exception, not the default.

The detection component’s job is to collect a browser fingerprint and ship it to Arkose’s backend. That payload is the bda field, for “browser data,” and it is the densest part of the whole system. The structure has been reverse-engineered in public detail, most thoroughly in the AzureFlow fingerprint documentation, and the broad shape is well agreed even though Arkose does not publish it. The root object carries a handful of fields: api_type (the static string js), f (a MurmurHash3-128 hex digest of the raw signal values joined with ;), n (a base64-encoded UNIX timestamp in seconds), wh (a window hash), enhanced_fp (a large nested object), fe (a flat array of fingerprint signals), ife_hash, and jsbd.

The fe array is the classic fingerprint surface, and the abbreviated keys are easy to read once you know them. DNT is the Do Not Track flag. L is the primary locale from navigator.language. D is screen.colorDepth. PR is devicePixelRatio. There are flags for whether session storage, local storage, and IndexedDB are available, a CPU-class signal, a canvas fingerprint, a list of detected fonts, the platform string, the timezone offset, the touch-support profile, and a count of plugins. None of this is novel on its own. Every fingerprinting library on the web reads the same DOM and navigator surface. What matters is that Arkose hashes the joined values into f and computes wh from the actual property names present on the window object and walked up its prototype chain, which is a cheap and effective tripwire for an automation framework that has injected globals or patched prototypes. If your window carries a property that a clean browser would not, the window hash drifts, and the drift is measurable.

*The reverse-engineered shape of the bda payload. Field names are from public documentation of observed traffic; Arkose does not publish the layout, and it changes over time.*

The enhanced_fp object goes wider and deeper. It carries WebGL vendor and renderer strings and render timings, codec support, media-query results, audio and math fingerprints, device-orientation and motion-sensor readings where available, speech-synthesis voices, and a set of explicit automation-detection probes. This is where the system looks for the specific tells of a headless or instrumented browser: a WebGL renderer that says SwiftShader instead of a real GPU, a navigator.webdriver that is set, a permission state that no human browser reports, a screen geometry that does not fit any shipping device. The jsbd field rolls up a smaller set of the same: history.length, whether cookies are enabled, the document title, and webdriver status. The point of collecting both broad device identity and narrow automation flags is that the two answer different questions. Device identity answers “have I seen this machine before, and does it cluster with known-bad ones,” and the automation flags answer “is this even a real browser.”

If you have read the Crawlex pieces on other vendors, this will feel familiar. The signal menu is close to what Akamai’s sensor_data payload collects, and the automation-probe philosophy is the same one Kasada’s anti-instrumentation builds its whole detection around. What differs between vendors is less the raw signals and more what they do with the verdict. Arkose’s answer is the game.

The encrypted envelope and the time-synced key

The bda object does not travel as plaintext JSON. It is encrypted client-side and sent as ciphertext, and the encryption scheme is the part that makes the payload annoying to forge. Based on public reverse-engineering, the JSON is encrypted with AES in CBC mode and wrapped in the OpenSSL-style envelope familiar from CryptoJS: a small object with ct (the base64 ciphertext), iv (the initialisation vector), and s (the salt). The exact key-derivation steps are not officially documented; what follows is inferred from observed traffic and community reverse-engineering, and it has shifted across versions, so treat it as a mechanism sketch rather than a spec.

The key is not a constant baked into the script. It is derived from the browser’s User-Agent string combined with a time value, and that time value is the lever. Arkose’s server returns a header on the settings request, x-ark-esync-value, which carries a server-side timestamp. The client uses that value (rather than its own clock) as part of the key material. The effect is a loose time-synchronisation between client and server: a bda blob encrypted with the wrong epoch decrypts to garbage on the server. A captured-and-replayed payload from an hour ago will not key correctly. And because the key folds in the User-Agent, a payload generated under one UA but presented with another is inconsistent by construction. None of this is unbreakable. It is a forger’s tax. You cannot mint a valid bda by hand-writing JSON; you have to reproduce the key derivation, the UA, and the server time window together, and Arkose can rotate the derivation in a script update whenever the public reproductions catch up.

This is the same design philosophy as Kasada’s KPSDK token and Imperva’s reese84 sensor: the value on the wire is cheap to observe and expensive to synthesise, and the whole defensive bet is that the gap between observing and synthesising stays wide enough to matter. The encryption is not protecting a secret. It is raising the cost of producing a payload that the server will accept as coming from a real, current, un-tampered browser.

The token flow: gt2, gfct, and verify

Once the fingerprint is encrypted, the flow moves through a sequence of endpoints under the /fc/ path on the Arkose API host. The naming is terse and consistent across the public reverse-engineering.

The session begins with a POST to /fc/gt2/public_key/<public-key>. This is the setup call. The encrypted bda rides along as a form field, together with the site URL, the public key, and assorted client metadata. The server’s response to gt2 is where risk-based enforcement first becomes visible. If the session looks clean, the response can carry a token and the interaction ends there with no game. If the session warrants a challenge, the response indicates which game type and how hard, and the client proceeds to fetch challenge data.

Fetching the challenge is a call to /fc/gfct/ (get FunCaptcha challenge, by the obvious reading). The response carries the fields you need to render and answer a round: a session_token (observed in the form 65917d170fd50ba78.9179887501, an id and a numeric suffix), a challengeID, a challengeURL pointing at the image or game asset, and a dapib_url. That last field is the clever one, and it gets its own section below. The answer to each round is then submitted back to the API, and a multi-round challenge loops the answer step until the required number of waves is done.

When the user finishes (or detection passed without a game), the client’s onCompleted callback fires with a response object containing a token. That token is the only thing the protected site receives. The site forwards it to its own backend, which calls the Arkose Verify API (the current major version is v4) server-side with the site’s secret key, and gets back a verdict: was this a real, completed session, and what was its risk. The protected site never sees the fingerprint, never sees the risk scoring, and never makes the trust decision itself. It receives a token and an API verdict. The whole security boundary lives on Arkose’s servers, which is the standard CAPTCHA-as-a-service shape and the same trust split you see in reCAPTCHA v2’s token lifecycle.

*The path from fingerprint to verdict. The risk decision at gt2 determines whether a game appears at all and, if it does, how hard it is.*

Risk-based difficulty: the game adapts to the verdict

The single most important thing to understand about FunCaptcha is that the difficulty is an output of the risk engine, not a fixed property of the widget. The same site, the same game type, will hand a clean residential session a one-round rotation puzzle and hand a flagged datacentre session a multi-round gauntlet, and the two are running the identical integration. Difficulty scales with suspicion.

Concretely, the levers the engine pulls include the number of rounds (a clean session might face one wave, a suspicious one several in a row), the visual difficulty of each round (more distractors, more ambiguous art, tighter rotation tolerances), the time pressure (Arkose can swap a stalled puzzle for a new one if it sits too long, with community observation putting the practical timeout around fifteen seconds), and at the extreme, whether to issue a challenge that is functionally unwinnable for automation or simply to refuse and return a failing verdict. A session whose fingerprint screams headless Chrome does not get a fair game. It gets an expensive one, repeatedly, or it gets nothing.

This is the same insight that drives reCAPTCHA v3’s scoring and Cloudflare’s bot score, but Arkose expresses it differently. Where a score-based system returns a number and lets the site decide a threshold, Arkose makes the difficulty itself the enforcement. The cost is paid in human-or-machine effort at solve time, not in a backend policy rule. For a legitimate user the experience degrades gracefully: a slightly harder puzzle, maybe a second round. For an attacker running at scale, the same gradient is a cliff, because every additional round multiplies across millions of attempts and the per-solve cost of a commercial solving service is the constraint that actually bounds the operation.

That economic argument is the explicit design goal. Arkose frames its whole pitch around making attacks unprofitable rather than impossible, on the theory that a sufficiently determined and well-funded attacker can solve any individual challenge but cannot afford to solve enough of them at the unit economics the difficulty enforces. The 1,250-plus variant count, the per-session art, and the risk-scaled rounds all feed the same lever: drive up the marginal cost of a solved session until the attack stops paying for itself. It is a deterrence model, and it lives or dies on whether the cost it imposes on bad traffic stays above the value an attacker extracts.

The dapib script: proof of work against replay

There is one more mechanism worth pulling apart, because it is the cleverest piece of anti-replay engineering in the system. When the client fetches challenge data from gfct, the response can include a dapib_url. This is a URL to a small JavaScript file that is unique to that session. Public documentation of the flow notes roughly 1,200 such files generated per day that rotate and expire. The file is not the challenge. It is a verifier.

The mechanism, as documented from observed behaviour, works like this. The per-session script runs inside an iframe and reaches up to the parent context, reading an object at window.parent.ae. That object exposes the answer the client has computed (the plain guess, as key/value pairs alongside the session token) and a callback named dapibReceive. The script takes each guess entry and transforms it, deriving an obfuscated proof for each one and packaging the result into the same ct/iv/s encrypted envelope used elsewhere, alongside biometric and timing data. The output of running the answer through the session’s own script is the tguess value. The plain answer is the guess. Both are submitted.

The point is that the server can issue a different verifier script for every session, so the transformation from guess to tguess is not a fixed function an attacker can implement once. It is per-session code that has to be fetched, executed in a believable browser context, and run against the specific answer. A solver that knows the correct answer to the visual puzzle still cannot submit it without also executing the session’s dapib script to produce the matching tguess, and that script is short-lived and unique. It is a proof-of-work step whose work is “run the code we just sent you, in a real browser, right now.” Reusing a known answer across sessions does not work because each session’s verifier is different. The exact transformation inside any given dapib file is deliberately obfuscated and changes per session, so there is no stable specification to point at; what is stable is the shape of the trick.

*The dapib script is a per-session verifier. Knowing the answer is not enough; you must run the session's own code to produce the proof that travels with it.*

Not every session gets a dapib_url. It appears to be deployed selectively, which fits the risk-based pattern: spend the extra anti-replay machinery on the sessions that have already drawn suspicion, and keep the clean path cheap. That selectivity is itself a tell about how the engine thinks. Defences are not applied uniformly. They are metered out in proportion to how much the session has already cost itself in the fingerprint.

The server-side and network layer

Everything above is the browser-facing flow. Arkose also offers an Edge API, a lightweight server-side path that accepts the request’s IP, TLS fingerprint, HTTP headers, and JA3/JA4 signals and scores risk before any JavaScript runs at all. This matters because it closes a gap that a pure client-side fingerprint leaves open. A request can carry a perfectly forged bda and still betray itself at the transport layer if its TLS ClientHello fingerprint does not match the browser its User-Agent claims to be. A Python client announcing itself as Chrome 124 has a JA4 that no real Chrome produces, and the Edge layer can catch that before the game logic is ever reached.

The combination is the familiar defence-in-depth stack. The network layer fingerprints the connection. The JavaScript layer fingerprints the runtime and packages it into the encrypted bda. The risk engine fuses both into a verdict. The game enforces that verdict by making the session pay in proportion to its suspicion. And the dapib proof-of-work binds a solved answer to the specific session that solved it. Each layer is defeatable in isolation by someone with enough time. The bet is that defeating all of them, simultaneously, at scale, and keeping the reproductions current as Arkose rotates them, costs more than the attack returns.

The enforcement UI itself has moved over time. Arkose retired Enforcement Challenge UI 1.0 on 1 September 2024 and migrated remaining customers to UI 2.0, a universal container that loads game types dynamically and uses fewer nested iframes than the older design. The current default game, internally Game 4, is MatchKey. Older formats like the Game 3 tile game still exist in the container, and Game 1 is deprecated. The accessibility work in UI 2.0 (better screen-reader handling, clearer focus management, audio challenge formats) is not incidental to a system that gates access to real services. A challenge that locks out users who rely on assistive technology is a legal and reputational liability, and the audio path exists to keep the challenge solvable for people who cannot do the visual game. That same audio path is, of course, a parallel attack surface, which is why the audio challenges are “listen and decide” questions rather than transcription tasks; transcription is as solved as text recognition.

Closing

FunCaptcha reads, on the surface, like a friendlier CAPTCHA. Rotate the animal, slide the piece, no squinting at warped letters. The friendliness is real for the clean traffic that never sees a game, and that is most traffic. But the game is not the security. The security is the decision made before the game renders: the encrypted fingerprint, the time-synced key, the risk verdict at gt2, the rounds metered out in proportion to suspicion, and the per-session dapib script that refuses to let a known answer be replayed. The puzzle you see is just the bill, and the engine decided the amount before you arrived.

What that design buys Arkose is a defence whose cost to the attacker scales continuously rather than stepping. There is no single token to forge, no fixed function to reimplement, no one art design to label and train against. There is a distribution to generalise, an encryption to keep current, a per-session proof to execute, and a difficulty curve that steepens exactly where the fingerprint looks worst. The honest limit of this model is also its premise: a well-funded attacker can solve any one challenge, and commercial solving services do, at a price. The whole system is a wager that the price stays high enough. As generative models get cheaper at the visual tasks the games are built on, that wager gets re-underwritten on every variant Arkose ships, and the variant count is the tell that the company knows it. You do not need 1,250 versions of a puzzle that machines cannot solve. You need them for a puzzle that machines can solve, just not cheaply enough, yet.

Sources & further reading

AzureFlow (2024), ArkoseLabs FunCaptcha fingerprint documentation — reverse-engineered reference for the bda payload, the fe array keys, enhanced_fp, the f/n/wh fields, and the gt2 endpoint.
Arkose Labs (2024), Client-Side Instructions (Standard Setup) — official integration docs covering the api.js script, setConfig, onCompleted, and the detection vs enforcement components.
Arkose Labs (2024), Arkose Enforcement Challenge UI 2.0 — the universal game container, Game 4 / MatchKey, reduced iframe nesting, and audio challenge support.
Arkose Labs (2024), Welcome to Arkose Labs — overview of the detection/enforcement model, the Verify API v4, and the Edge API accepting IP, TLS, and JA3/JA4 signals.
unfuncaptcha (2024), tguess: generate guess and tguess components — documents the dapib_url, the per-session verifier script, window.parent.ae, dapibReceive, and the ct/iv/s envelope.
unfuncaptcha (2024), bda: view, edit, and repackage Arkose fingerprints — repackaging tool whose interface confirms the bda fingerprint structure and the gt2 setup flow.
Arkose Labs (2023), Arkose MatchKey Challenges — vendor description of MatchKey, the 1,250+ variant claim, and the deterrence-by-cost model.
Jin, Huang, Duan, Zhao, Liao, Zhou (2023), How Secure is Your Website? A Comprehensive Investigation on CAPTCHA Providers and Solving Services — academic survey finding most current CAPTCHAs vulnerable to both human and automated solving services.
Gosschalk, K. (2018), PayPal Makes Strategic Investment in Arkose Labs — founder’s note on the PayPal investment, useful for company history and dates.
Startup Daily (2019), Brisbane-founded Arkose Labs secures strategic investment from PayPal — origin as FunCaptcha from a Brisbane Startup Weekend, founders Kevin Gosschalk and Matthew Ford.
Arkose Labs (2023), Combating Generative AI Bots — the vendor’s framing of why interactive challenges and per-attempt cost are meant to resist generative-AI solvers.