Skip to content

reCAPTCHA v2's bframe challenge: image grids, risk analysis, and the token lifecycle

· 18 min read
Copyright: MIT
The word bframe rendered as a monospace wordmark with a 3-by-3 grid motif and one tile highlighted in orange

Open the network panel on a page with the “I’m not a robot” checkbox and click it. If the click resolves on its own, you saw two requests and a green tick. If it does not, a second iframe slides up with a grid of street photos and an instruction to find the traffic lights. That second iframe has a name. In Google’s own source it is the bframe, served from /recaptcha/api2/bframe, and it is where almost all of reCAPTCHA v2’s interesting behaviour lives. The checkbox is the cheap part. The bframe is the test.

What does the bframe actually decide, and on what evidence? The honest answer is that the checkbox decided most of it before the grid ever appeared. By the time you are picking out crosswalks, reCAPTCHA has already scored you on cookies, timing, and a payload computed by an obfuscated client, and the image grid is less a fresh test of your eyes than a way to make a suspicious session pay a cost. This post walks the v2 flow in the order the browser runs it, names the endpoints and tokens that carry the state, and is careful to separate what Google documents from what is only known through reverse engineering and academic measurement.

The sections below follow the request order. First, where v2 came from and what “No CAPTCHA” meant. Then the two iframes, anchor and bframe, and the three endpoints that move between them: anchor, reload, and userverify. Then the image grid itself, selection versus click-based, and why the grids got harder. Then the obfuscated client that runs before any of this and feeds the risk engine. Then the g-recaptcha-response token, its single use and its two-minute life. A closing section on what the measurement literature says the whole apparatus is now worth.

2014: what “No CAPTCHA” actually changed

reCAPTCHA started as Luis von Ahn’s project to turn CAPTCHA-solving into useful labour, digitising scanned books one warped word at a time. Google bought it in September 2009. For years the test was distorted text, and for years that text got more distorted as optical character recognition caught up, until the words were nearly as hard for humans as for machines. That arms race had an end. Google shut down the original reCAPTCHA v1 API on March 31, 2018, by which point text was a solved problem on both sides.

The turn came on December 3, 2014, when Google announced what it called “No CAPTCHA reCAPTCHA.” The pitch was a single checkbox. Vinay Shet, then the reCAPTCHA product manager, framed it around a backend Google had built the year before. The phrasing in the announcement is worth keeping because it describes the whole design in one line: an “Advanced Risk Analysis backend” that “actively considers a user’s entire engagement with the CAPTCHA,” before, during, and after the interaction, to decide whether the user is human. The checkbox was never the test. It was a trigger that let Google watch the approach, the click, and the aftermath, and the warped-text grid only appeared when that observation came back inconclusive. Snapchat, WordPress, and Humble Bundle were among the first to ship it.

That design is still the v2 you meet today. A checkbox that frequently passes you with no puzzle at all, backed by a scoring system you never see, with the image grid held in reserve for sessions the score does not trust.

The two iframes: anchor and bframe

A v2 widget is two nested iframes, both served from www.google.com (or the recaptcha.net mirror, which exists for regions where the main domain is blocked). Knowing which iframe does what is the whole map.

The first is the anchor. It loads from /recaptcha/api2/anchor and renders the visible checkbox, the reCAPTCHA logo, and the privacy and terms links. The query string on that request already carries the site’s public key as k, the page origin as co (base64-encoded), the widget size and language as size and hl, and a version stamp v that pins the exact client build. The anchor response embeds a token, conventionally called the c token, which is the session handle that every later request quotes back. It is the thread the whole flow hangs on.

The second iframe is the bframe, loaded from /recaptcha/api2/bframe with the same k and a matching co. This is the challenge surface. When a session needs a puzzle, the image grid renders inside the bframe, not the anchor. The two frames talk to each other and to the host page through postMessage, because same-origin script access across the google.com boundary is not available to the host. That messaging layer is part of why the widget is awkward to drive from outside a real browser: the host page never touches the challenge DOM directly, it only sees the eventual g-recaptcha-response value land in a hidden textarea.

Between those two iframes sit three POST endpoints, and the sequence among them is the cleanest way to read a session.

anchor frame the checkbox bframe image grid POST /api2/anchor → c token POST /api2/reload → challenge POST /api2/userverify → token if score is low, reload returns a grid *The c token issued by the anchor request is the session handle. Reload asks for a challenge and userverify submits the answer. A frictionless pass skips the grid entirely and never paints the bframe.*

anchor, reload, userverify

The flow has three named stops. The anchor request renders the checkbox and returns the c session token. When you click, the client computes a payload (more on that below) and POSTs to /recaptcha/api2/reload, quoting the c token and a reason field that encodes why the reload fired. The reload response is the decision point. If the risk engine is satisfied, reload can hand back a fresh response token and the widget goes green with no grid. If it is not satisfied, reload returns challenge metadata: the challenge type, the instruction string, and the image payload that the bframe will render.

Once the user has clicked tiles and pressed verify, the client POSTs to /recaptcha/api2/userverify, again quoting c, with the tile selections encoded in a response field. The userverify response carries the final g-recaptcha-response token, the value the host page actually needs. The widget pushes that string into a hidden <textarea> named g-recaptcha-response and, if the integrator wired one up, invokes the JavaScript data-callback.

A caution on the field names inside these POST bodies. The endpoint paths, the k, co, v, and hl query parameters, and the g-recaptcha-response output are all observable in any browser’s network panel and stable across years. The internal POST field names beyond the obvious c and reason, and the exact byte layout of the encoded response and telemetry blobs, are not published by Google. What circulates in community reverse-engineering notes (fields at fixed indices in a serialised array, an encrypted fingerprint slot, a telemetry slot) is inferred from observed traffic, and Google rotates the client build often enough that any specific index is a snapshot, not a contract. Where this post names a field, the named ones are the documented or panel-visible ones; the rest are described by role, not by a number I cannot stand behind.

The image grid, and why it got harder

When the bframe does render a puzzle, the markup is plain. The challenge image sits in an HTML table, one <img> per cell, all cells pulling slices of a single source image so the grid looks continuous. The instruction lives in an element identified by rc-imageselect-instructions, the tiles by rc-imageselect-tile, and the submit button by recaptcha-verify-button. The checkbox in the anchor frame is identified by recaptcha-anchor. Those identifiers have been stable for years and are the reason scripted solvers can locate the pieces without guessing.

There are two challenge shapes, and the difference matters for cost. The older one is selection-based: a static grid, usually 3-by-3, where you tick every tile containing the target object and submit once. The newer one is click-based: you tick a matching tile, it fades, and a fresh image loads in that same cell, and you keep going until no tile matches. Click-based challenges take measurably longer because each click triggers an image regeneration round-trip, and that delay is the point. A grid you can solve in one shot is cheap to automate; a grid that makes you wait between every selection is not. The 4-by-4 grids, with a single image carved into sixteen cells, are the harder selection variant.

selection-based (one shot) click-based (fade and refill) tick all matches, submit once click a match, it fades, new tile loads repeat until none remain *Selection-based grids resolve in one submit. Click-based grids force a network round-trip per click, which is the cost mechanism: solving stays slow even when the vision problem is easy.*

The object categories are narrow. A 2020 measurement that collected over ten thousand live challenges found only nineteen distinct target classes, with the top five (bus, traffic light, crosswalk, car, and fire hydrant) making up more than three-quarters of everything served. That narrowness is exactly what made the grids tractable for machine vision. The same study reported breaking the hardest current challenges with an object-detection pipeline at an 83.25 percent success rate, around twenty seconds per challenge including network delay, using a YOLOv3 detector trained on COCO classes plus a custom set for the categories COCO does not cover, such as crosswalk and chimney. The headline finding is not that a model can see a traffic light. It is that a CAPTCHA whose entire difficulty rests on a small, fixed set of everyday objects has no defensible floor once detectors are cheap.

Google’s response was to lean on everything around the image rather than the image itself. The grids pull from noisier, more cluttered scenes than the clean single-object tiles of the early years, which hurts off-the-shelf classifiers. The challenge images carry anti-recognition perturbation in some cases. And the difficulty is tuned to the session: a client the risk engine already distrusts gets handed the harder challenges. That last lever is the important one, because it means the grid is not a fixed test applied uniformly. It is a dial, and where the dial sits was decided before the bframe drew a single tile.

What runs before the click

The reason the checkbox can pass you without a puzzle, and the reason it can also decide to hand you the nastiest 4-by-4 it has, is a client-side payload computed in the background. This is the part of v2 that is genuinely hard to read, and the part where I will be most careful to separate documented behaviour from inference.

What Google states publicly is sparse: there is an advanced risk-analysis engine, it considers the user’s engagement before, during, and after the interaction, and it uses signals from the browser environment and from Google’s own cookies. The independent reverse-engineering record fills in the shape. As far back as the 2014 launch, researchers found that the No CAPTCHA client was not ordinary JavaScript. Google shipped a virtual machine written in JavaScript, with its own bytecode language, and an interpreter to run it. The payload the bytecode produces (resource-timing entries, performance measurements of the VM’s own execution, enumeration of browser objects and their quirks, canvas and rendering probes) gets encrypted and POSTed to the challenge endpoint. The single most cited early write-up of this is the InsideReCaptcha reverse engineering, which documented the dual base64 parameters in the anchor callback, the VM entry points, and the XTEA-based block encryption applied to parts of the payload.

The mechanism matters more than any one field, so describe the mechanism and stop there. A VM-plus-bytecode design means the meaningful logic is data, not code: you can pretty-print the interpreter, but the thing it interprets only acquires meaning at runtime, which raises the cost of statically understanding what is collected. The same pattern shows up across the modern anti-bot field, from Kasada’s KPSDK and its custom interpreter to Akamai’s sensor_data payload, and the motivation is identical: force anyone retooling against the client to re-derive the bytecode semantics on every rotation. reCAPTCHA was one of the earliest mass-deployed examples.

Two signals carry disproportionate weight, and both are documented enough to state plainly. The first is Google’s own cookies. If you are signed into a Google account, or simply carry a NID-style Google cookie with a history attached, the checkbox passes far more often with no grid. The 2020 study put it directly: a user with no history of Google services is assigned relatively difficult challenges. The second is everything about how the click arrived, the cursor path and timing into the checkbox, which the risk engine reads as part of the “before” phase Shet described in 2014. Neither signal is a CAPTCHA in the classic sense. The grid is the fallback for when these come back thin.

risk engine score the session Google cookies / sign-in cursor path and timing VM telemetry payload browser object probes resource / perf timing pass: no grid or hand a grid, difficulty by score *The grid is a fallback, not the main event. By the time it renders, the session has already been scored on cookies, timing, and an encrypted client payload, and that score sets which challenge you get.*

The same logic, taken to its end, became reCAPTCHA v3: drop the checkbox entirely and just return the score. If you want the scoring side rather than the challenge side, that is the subject of reCAPTCHA v3 scoring. v2’s distinguishing feature is that it keeps a visible, interactive fallback when the score is inconclusive, where v3 hands the number to the site and leaves the decision there.

The g-recaptcha-response token and its short life

Everything above exists to produce one string. When a session resolves, by frictionless pass or by solved grid, the userverify response yields the g-recaptcha-response token, and the widget drops it into the hidden textarea of that name. The integrator can also read it via the grecaptcha.getResponse() JavaScript call or receive it as the argument to a data-callback. Until the challenge resolves, that field is empty, which is the cleanest programmatic signal that a session has not yet passed.

The token is a bearer credential with two hard limits, and both are documented by Google. It is valid for exactly two minutes, and it can be verified only once. Google’s verification documentation states plainly that each response token is valid for two minutes and can only be verified a single time to prevent replay attacks. The site’s backend redeems it by POSTing to https://www.google.com/recaptcha/api/siteverify with two required fields, secret (the private key, which never touches the browser) and response (the token), plus an optional remoteip. The siteverify reply is a small JSON object: a boolean success, a challenge_ts timestamp in ISO 8601, the hostname the token was solved on, and an error-codes array when something is wrong.

userverify issues token g-recaptcha- response in hidden textarea siteverify secret + response → success valid 120 seconds · single use · bound to hostname *One string, two hard limits. The token lives 120 seconds, redeems once at siteverify, and the reply names the hostname it was solved on so a site can reject tokens minted elsewhere.*

Those two limits are the entire server-side defence of the token, and they shape how the credential gets abused and defended. The two-minute window means a token has to be redeemed almost immediately, which is why third-party solving services are built around fast hand-off: solve the grid, return the string, and the buyer must POST it to their own backend before the clock runs out. The single-use rule kills naive replay. The hostname in the siteverify reply is the third quiet check, because it lets a site confirm the token was minted for its own origin rather than farmed on a domain the attacker controls, though sites that ignore that field give up the protection. A common production bug is treating the token as if it persists, caching the form’s value and reusing it, which works in testing and fails the moment two minutes elapse between page load and submit.

What the token does not encode, as far as the public record goes, is a human-readable verdict. v2’s siteverify gives you a boolean, not a score. The risk signal that decided whether you saw a grid stays on Google’s side; the site learns only that the session, by whatever path, ended in a pass. The Enterprise and v3 products expose a numeric score, but the classic v2 checkbox keeps that judgement private and hands the relying site a yes or no.

What the apparatus is now worth

The uncomfortable result from the measurement literature is that the visible part of v2 stopped being a meaningful barrier some time ago. The 2020 object-detection work cracked the hardest live grids at better than four in five attempts in roughly twenty seconds, which is competitive with paying a human. A 2023 real-world study at UC Irvine, run across more than 3,600 distinct users over thirteen months, came to a blunter conclusion: the image challenge mostly annoys humans while doing little to stop automation, and the authors argued the technology should be deprecated. That study is also the source of the widely repeated estimate that humanity has poured on the order of hundreds of millions of hours into solving these grids, time the paper reframes as the real product, with the security story as the wrapper.

If the grid is beatable and the humans hate it, what is v2 still doing on so many pages? It is doing the thing the 2014 announcement actually described. The checkbox was always a front for a risk score computed from cookies, timing, and an obfuscated client payload, and that scoring layer (not the traffic-light grid) is what filters the easy bulk of automated traffic before any image is drawn. The grid is the dunk tank for sessions the score already doubts. That design has aged better than the puzzle it is famous for, which is why the same shape now shows up under every modern anti-bot brand, and why the history of CAPTCHA reads less like a sequence of harder puzzles than like a slow migration of the real decision off the screen and into a payload you never see.

The piece worth holding onto is the inversion. A senior engineer debugging a v2 integration will spend their time on the visible grid, because that is what renders and what users complain about. The grid is the least informative part of the system. The token is two minutes of single-use bearer credential and nothing more. The decision that mattered happened in the reload response, on evidence the browser handed over before the first tile loaded, and on a cookie you have been carrying since the last time you were signed into Google.


Sources & further reading

  • Google for Developers (2024), Verifying the user’s response — the siteverify endpoint, required secret/response fields, JSON reply shape, and the statement that each token is valid for two minutes and verifiable once.
  • Google for Developers (2024), Choosing the type of reCAPTCHA — official descriptions of the v2 checkbox, invisible badge, and how challenges are decided.
  • Vinay Shet / Google, via 9to5Google (Dec 3 2014), Google kills CAPTCHAs with new one-step validation — the No CAPTCHA announcement and the “before, during, and after” risk-analysis quote.
  • Hossen, Tu, Rabby, Islam, Cao & Hei, University of Louisiana at Lafayette (RAID 2020), An Object Detection based Solver for Google’s Image reCAPTCHA v2 — the 83.25% solver, the R×C grid model, selection vs click-based types, the rc-imageselect-tile / recaptcha-anchor / recaptcha-verify-button identifiers, and the nineteen object categories.
  • Searles, Prapty & Tsudik, UC Irvine (arXiv 2311.10911, 2023), Dazed & Confused: A Large-Scale Real-World User Study of reCAPTCHAv2 — 13-month, 3,600-user study concluding v2 should be deprecated; source of the hours-of-human-labour estimate.
  • neuroradiology (2014), InsideReCaptcha — early reverse engineering of the No CAPTCHA client: the anchor callback’s base64 parameters, the JavaScript VM and bytecode, and XTEA-encrypted payload sections.
  • trebolese (2023), extra-parameters-recaptcha — community notes on detecting v2/v3/Enterprise and the k, co, pageAction, and invisible/enterprise flags from network requests.
  • Bock, Patel, Hughey & Levin, University of Maryland (USENIX WOOT 2017), unCaptcha: A Low-Resource Defeat of reCaptcha’s Audio Challenge — defeating the audio fallback at 85% and Google’s countermeasures.
  • Wikipedia (2026), reCAPTCHA — version history, the Sep 2009 Google acquisition, and the March 31 2018 shutdown of the v1 text API.
  • Friendly Captcha (updated Apr 2026), reCAPTCHA v2 vs v3: Effective Bot Protection? — current-state comparison, the _GRECAPTCHA and NID cookies, and the 2023 CNIL consent fines.

Further reading