Where does the reCAPTCHA v3 token come from and how does my server turn it into a score?

The browser loads the API with the site key, and reCAPTCHA begins observing the page immediately. When the protected action happens, calling grecaptcha.execute returns an opaque, encrypted token that does not carry the score in the clear. Your backend POSTs that token and your secret key to the siteverify endpoint, and Google replies with JSON containing the float score along with success, action, challenge_ts, and hostname. Tokens expire two minutes after they are issued.

Why does the same human get a high reCAPTCHA v3 score in signed-in Chrome but a low one over Tor?

For a fast session with little behavior to observe, the score is dominated by reputation rather than what the user did on the page. A browser signed into a Google account is the best-known browser there is, carrying account history and a cross-site cookie trail that signals a real, long-lived human. A fresh browser on Tor gives the model almost nothing to anchor on, so it scores cautiously. Independent 2019 testing reported Tor sessions scoring around 0.3 and proxies and VPNs dragging the score down similarly.

What signals actually feed the reCAPTCHA v3 score?

Google never published the feature list, but the signals fall into three buckets. On-page behavior covers mouse path and cadence, scroll speed and stops, click and keystroke timing, and action context. The browser environment covers user-agent coherence, fingerprint surface, screen and viewport properties, and automation or headless markers. Network and identity reputation covers IP history, datacenter or proxy detection, request rate, the _GRECAPTCHA cookie, and Google account state. Testing suggests the reputation bucket weighs most on a thin session.

Why does my free-tier reCAPTCHA v3 distribution only show scores like 0.1, 0.3, 0.7, and 0.9?

Without a billing account attached, the score is quantized to four values: 0.1, 0.3, 0.7, or 0.9. The continuous-looking float is really a coarse four-bucket classification at the free tier. With billing enabled it resolves to an 11-level scale across the full 0.0 to 1.0 range. The common 0.5 threshold works because it falls cleanly in the gap in the middle of those four buckets, not because 0.5 is a calibrated cutoff.

What do the reCAPTCHA Enterprise reason codes tell you about a score?

When an assessment runs under a billing-enabled Google Cloud project, the response can include named reason codes that hint at why the score landed where it did. AUTOMATION fires when interaction matches an automated agent, UNEXPECTED_ENVIRONMENT when the event comes from an environment Google considers illegitimate, TOO_MUCH_TRAFFIC for unusually high volume from a source, and UNEXPECTED_USAGE_PATTERNS when interaction differs from expected patterns. LOW_CONFIDENCE_SCORE signals that too little site traffic was received to generate quality analysis.

reCAPTCHA v3 scoring: how the 0.0 to 1.0 score is computed and what feeds it

A reCAPTCHA v2 checkbox asks you to do something. A reCAPTCHA v3 page asks you nothing, watches, and hands your backend a single number between 0.0 and 1.0. The number is supposed to mean “how likely is it that this request came from a human.” The whole product is built around that number. So the obvious question, the one every engineer who has wired up a v3 site key eventually asks, is: where does the number come from? What did Google measure on the page to produce 0.1 instead of 0.9, and why does the same form submission from the same human sometimes score well and sometimes get blocked?

The honest answer is that Google does not tell you, and has never told you, the formula. What it tells you is the interface: a JavaScript call, an action string, a verification endpoint, and a float. Everything between the float and the page is a proprietary risk model trained on Google’s traffic. But the interface leaks a lot, the documentation admits to more than people remember, and independent researchers have probed the edges of the model hard enough that the shape of it is no longer a mystery. This post traces that shape.

The sections below walk through the v3 flow as the browser and your server actually see it, then the signals Google admits feed the score, then the part nobody documents: how much of the score is behavior on the page versus reputation Google already held about the visitor. After that, the reason codes the Enterprise tier exposes, the well-documented bias toward signed-in Google users, the privacy and legal fallout, and finally where v3 sits in 2026 now that the classic product is being folded into reCAPTCHA Enterprise.

The flow: from grecaptcha.execute to a float

Start with the wire. A v3 site loads the API with a render parameter carrying the site key, https://www.google.com/recaptcha/api.js?render=SITE_KEY. That script does two things. It registers the site key and it begins collecting. From the moment the script runs, reCAPTCHA is observing the page, not waiting for a submit button.

When the user does something you want to protect, you call grecaptcha.execute with the site key and an action, and you get a token back through a promise:

1
grecaptcha.ready(function () {
2
  grecaptcha.execute('SITE_KEY', { action: 'login' })
3
    .then(function (token) {
4
      // ship token to your backend with the form
5
    });
6
});

Google’s own guidance is blunt about timing. Call execute when the user takes the action, not on page load, because the token has a short life. reCAPTCHA tokens expire two minutes after they are issued. The token is an opaque, encrypted blob. It is not the score. It is a sealed receipt that says “reCAPTCHA observed a session and here is what it concluded, signed so only Google can read it back.”

Your backend then exchanges that token for the verdict. You POST the token and your secret key to https://www.google.com/recaptcha/api/siteverify, and Google answers with JSON. The fields are stable and worth naming exactly, because a surprising number of integrations check the wrong one. success is a boolean that tells you whether the token was valid and well-formed, not whether the user was human. score is the float from 0.0 to 1.0. action echoes back the action string that was set when execute ran. challenge_ts is the ISO timestamp of the challenge. hostname is the site where it was solved. And error-codes is an optional array that appears when something went wrong.

*The token never carries the score in the clear. The browser gets a sealed blob; only the server-side siteverify exchange turns it into a float your code can read.*

Two checks matter here and people skip both. First, verify the action in the response equals the action you set. The token is bound to the action at execute time, so a token minted on a low-stakes page cannot be replayed against your login action without the mismatch showing. Second, the score is advice, not a decision. Google’s documentation is explicit that there is no built-in block. You get a number and you decide. The suggested starting point is a threshold of 0.5, with the recommendation to first run reCAPTCHA in a no-action mode and look at your real traffic distribution in the admin console before you pick a cutoff.

That last point is the first real clue about how the score works. Google tells you the score will not be meaningful until reCAPTCHA has seen your traffic. The FAQ says it plainly: scores may not be accurate because v3 relies on seeing real traffic, and v3 is for site owners who want more data about their traffic. The score is not a fixed function of one request. It is a model that calibrates against the population of requests hitting your specific site.

The action tag, and why it is not cosmetic

The action string looks like a logging label. It is more than that. When Google introduced v3 in October 2018, the action was presented as a way to tag the key steps of a user journey so reCAPTCHA could run its risk analysis in context. Context is the operative word. The same visitor doing the same thing scores differently depending on whether the action is homepage or login or checkout, because the model has learned what normal traffic to each of those actions looks like on your site.

The rules on the action string are narrow. It may contain only alphanumeric characters, slashes, and underscores. And it must not be user-specific, which rules out the tempting idea of stuffing a user ID or email into it. The reason is partly privacy and partly that a per-user action defeats the point: the model groups by action to learn a per-action baseline, and a unique action per user gives it a population of one.

Setting distinct actions also unlocks the admin console breakdown. Google surfaces a score distribution for your top actions, so you can see that login skews bot-heavy while homepage looks clean, and set per-action thresholds accordingly. An engineer who wires every page to a single generic action throws that resolution away and gets one blurry distribution for the whole site.

What feeds the score: the signals Google admits

Now the substance. Google has never published the feature list, but between the official docs, the 2014 lineage, and a decade of vendor and researcher analysis, the signal categories are well established. They fall into three buckets, and the relative weight between them is the whole story.

The first bucket is on-page behavior. Mouse movement, the cadence and path of the cursor, scroll behavior including speed and where it stops, click timing, keystroke timing, and the general shape of how a session unfolds before the protected action fires. This is the bucket people assume dominates, because it is the visible part, the thing a human does and a naive script does not. The reinforcement-learning study that probed v3 in 2019 framed exactly this: the authors modeled the page as a grid and trained an agent to move and click the way a human would to lift the score.

The second bucket is the browser and device environment. The user agent and its consistency, the rendering and feature surface that amounts to a browser fingerprint, screen and viewport properties, and signals that betray automation frameworks or headless runtimes. This is the same family of checks that every modern anti-bot vendor runs, and reCAPTCHA’s lineage in it goes back to 2014’s “No CAPTCHA reCAPTCHA,” where Advanced Risk Analysis began judging the browser environment before deciding whether to show a challenge at all. If you have read the Crawlex write-ups on Kasada’s anti-instrumentation or DataDome’s first-request signals, this bucket will look familiar; the fingerprinting playbook is broadly shared across the industry.

The third bucket is the one that matters most and gets discussed least: network and identity reputation. The visitor’s IP and its history, whether it belongs to a datacenter or a residential range, whether it is a known proxy or exit node, the request rate and pattern from that source, and, critically, what Google already knows about the browser from its own cookies and the rest of its product surface.

*The accent bucket is the one Google documents least and that independent testing found weighs most when the page has little behavior to judge.*

The reCAPTCHA cookie has a name worth knowing. The FAQ confirms reCAPTCHA sets a cookie called _GRECAPTCHA when it executes, for the stated purpose of providing its risk analysis. That cookie is the per-browser thread of identity that lets the model connect this visit to prior visits. And because the script is served from Google’s own domains, it also sits next to whatever Google cookies the browser already carries from being signed into a Google account, which is where the most uncomfortable finding about v3 lives.

The part nobody documents: reputation over behavior

Here is the claim that separates v3 from how most people imagine it. For a session with little behavior to observe, the score is dominated by who Google already thinks you are, not by what you did on the page. The behavior bucket matters at the margin and on sessions that linger. But for the common case of a fast form submit, the reputation bucket carries the verdict.

The cleanest public evidence is the 2019 study by Ismail Akrout, Amal Feriani, and Mohamed Akrout, who probed v3 directly while building a reinforcement-learning agent to lift their score. The exact internal feature layout is not public; what follows is inferred from their observed traffic and from Google’s own documentation. They reported that visiting a v3-protected page through Tor produced a low score, around 0.3, and that using a proxy or VPN dragged the score down similarly. The same automation, run from a browser signed into a Google account, scored higher than the same browser with no Google account attached. The IP reputation and the account state moved the number more than the mouse did.

This is consistent with what Google itself says about needing to see real traffic before scores stabilize. A brand-new browser, on a fresh IP, with no _GRECAPTCHA history and no Google session, gives the model almost nothing to anchor on, so it leans cautious and scores low. The model is not catching a bot in that case so much as failing to recognize a friend. The two outcomes look identical from your backend, a low float, but the cause is different, and the difference is the source of v3’s notorious false-positive rate on privacy-conscious users.

It is worth being precise about what is documented versus inferred. Google documents the score range, the threshold guidance, the action mechanic, the _GRECAPTCHA cookie, and that the model trains on site traffic. Google does not document feature weights, the exact behavioral features, or how much account state moves the score. The reputation-dominates conclusion is an inference from third-party testing plus the documented “needs to see real traffic” behavior, and it should be read as that. It is well-supported, but it is not a number Google published.

Reason codes: the closest thing to an explanation

Standalone v3 hands you a bare float. The Enterprise tier, which is where Google has been steering everyone, exposes more. When an assessment runs under a billing-enabled Google Cloud project, the response can carry named reason codes that gesture at why the score came out where it did. These are the nearest thing to a documented breakdown of the model’s thinking.

The documented reasons include AUTOMATION, which fires when the interaction matches the behavior of an automated agent. UNEXPECTED_ENVIRONMENT fires when the event originated from an environment Google considers illegitimate. TOO_MUCH_TRAFFIC flags traffic volume from the source that is higher than normal. UNEXPECTED_USAGE_PATTERNS fires when the interaction differs significantly from expected patterns. And LOW_CONFIDENCE_SCORE is the tell for the cold-session problem: too little traffic was received from the site to generate quality risk analysis. That last one is Google admitting in a label what the researchers observed in the wild.

The score granularity also changes with billing. Before a billing account is attached, the Enterprise documentation notes that the score takes one of four values, 0.1, 0.3, 0.7, or 0.9. With billing enabled it resolves to an 11-level scale across the full 0.0 to 1.0 range. So the “continuous” score that everyone treats as a smooth probability is, at least at the free tier, a coarse four-bucket classification dressed up as a float.

*The continuous-looking float is quantized. Without billing the model only ever returns 0.1, 0.3, 0.7, or 0.9, which is why so many sites cluster their thresholds at 0.5.*

That quantization explains a common observation. Site owners who set a 0.5 threshold on a free key and watch their distribution see almost everything land at 0.1, 0.3, 0.7, or 0.9, with nothing in between. The 0.5 line cleanly splits the four buckets into a pass and a fail. It is not that 0.5 is a magic calibrated cutoff; it is that 0.5 is the gap in the middle of a four-point scale.

The Google-account bias and its consequences

The finding that signed-in Google users score better is not a fringe result. It was reported widely after the 2019 study, and it follows directly from the reputation-dominates structure. If the model’s strongest signal on a thin session is what Google already knows about the browser, then a browser carrying a logged-in Google session is the best-known browser there is. Google has the account history, the cross-site cookie trail, and the device continuity to be confident this is a real, long-lived human. A fresh browser on Tor has none of that, so it gets the cautious score.

The structural consequence is a tilt. A system meant to tell humans from bots ends up partly telling Google-legible humans from Google-illegible ones. A real person who runs a hardened browser, blocks third-party cookies, uses a VPN, or simply refuses to sign into Google is treated as more bot-like than an actual script running inside a warmed-up, signed-in Chrome profile. The defensive lesson for anyone reading the score is that a low v3 score is not proof of automation. It is proof of low Google-legibility, which correlates with automation but also sweeps up a meaningful slice of privacy-conscious humans. Treating the float as a hard gate, with no fallback, locks those people out.

This is why Google’s own guidance pushes you toward soft enforcement. Take the action in the background, do not hard-block, run a second factor when the score is low rather than refusing the request outright. The same reasoning runs through the reCAPTCHA v2 bframe challenge: when risk analysis is uncertain, fall back to a challenge instead of a verdict. v3 removed the visible challenge but kept the uncertainty, and pushed the decision of what to do about it onto you.

The mechanism that makes v3 work is the same mechanism that makes it a privacy problem. To score a session against Google’s knowledge of the browser, reCAPTCHA has to collect from the browser and cross-reference Google’s own data. That means cookies, device and browser information, and behavioral telemetry shipped to a US company, frequently without the visitor being told.

European regulators have treated this as exactly what it looks like. In March 2023, France’s CNIL fined the e-scooter company Cityscoot €125,000, and part of that decision turned on reCAPTCHA. The CNIL’s reasoning was that reCAPTCHA works by collecting hardware and software information from the user’s device and sharing it with Google for analysis, and that deploying it on account creation, login, and password recovery without obtaining consent breached Article 82 of the French Data Protection Act. The reCAPTCHA-specific portion of the fine was €25,000, with the remainder tied to geolocation and processor-contract failures. The legal theory generalizes: if reCAPTCHA reads and writes device information for risk analysis, and that is not strictly necessary to deliver a service the user requested, it needs consent under the ePrivacy regime, and the invisible nature of v3 makes obtaining meaningful consent harder, not easier.

The criticism is not only legal. The same tracking that produces the score is the tracking that civil-liberties researchers object to: a persistent cookie and a cross-context data trail building a risk profile of a browser as it moves around the web. The defenders of v3 point out that the data exists to fight fraud. The critics point out that “fight fraud” and “profile every visitor” are, mechanically, the same data collection, and that a site owner who drops the v3 snippet onto a contact form has quietly enrolled every visitor into it. Both things are true at once, which is what makes the privacy debate around v3 sharper than the one around a visible checkbox a user can see and decide about.

Where v3 sits in 2026

The product around the score has changed even though the score has not. Google has been consolidating classic reCAPTCHA, the v2 and v3 you get from the legacy admin console, into reCAPTCHA Enterprise running on Google Cloud. Through 2025 the message to site owners was that reCAPTCHA keys must be moved to a Google Cloud project, with automatic opt-in migration starting in Q4 2025 and manual migration strongly recommended before the end of that year. The legacy admin console stays readable for historical data, but new key management moves to the Cloud console. The pricing also shifted, from effectively free-by-default toward a usage-based model with paid tiers for higher volume.

For an engineer, the practical effect is that the bare float is increasingly the entry-level view of a richer assessment. The reason codes, the 11-level granularity, the WAF integration that lets the verdict act at the network edge through Cloud Armor and partner CDNs, all of that lives on the Enterprise side. The 0.0 to 1.0 score people have integrated against for years is still there, still computed the same opaque way, but it is now the thin edge of a platform Google would prefer you pay for. The frictionless promise of 2018, no puzzles, just a number, held. What changed underneath is who the number is really about.

If there is one thing to carry away, it is that the v3 score is less a measurement of the request in front of you and more a lookup of everything Google already knew about the browser that made it. The mouse cadence and the scroll pattern are real inputs, and on a long enough session they matter. But on the fast path, the one most forms actually take, the float is dominated by IP reputation and Google-account legibility, which is why the same human scores 0.9 in a signed-in Chrome and 0.3 behind Tor. The number answers “how well does Google know you,” and only secondarily “are you a human.” For most traffic those two questions have the same answer. The gap between them is where the false positives, the privacy complaints, and the regulatory fines all live.

Sources & further reading

Google (2018), Introducing reCAPTCHA v3: the new way to stop bots — the original announcement defining the 0 to 1 score and the action tag concept.
Google for Developers, reCAPTCHA v3 developer guide — the grecaptcha.execute flow, the action rules, the siteverify response fields, the two-minute token expiry, and the 0.5 threshold guidance.
Google for Developers, reCAPTCHA FAQ — confirms the _GRECAPTCHA cookie and that v3 scores rely on seeing real site traffic.
Google Cloud, Interpret assessments for websites — the named reason codes (AUTOMATION, UNEXPECTED_ENVIRONMENT, TOO_MUCH_TRAFFIC, UNEXPECTED_USAGE_PATTERNS, LOW_CONFIDENCE_SCORE), the 4-value free tier, and the 11-level scale.
Google Cloud, Action names — the constraints on action strings and why they must not be user-specific.
Ismail Akrout, Amal Feriani, Mohamed Akrout (2019), Hacking Google reCAPTCHA v3 using Reinforcement Learning — the study that probed v3 scoring, reporting the Tor and Google-account effects on the score.
Thomas Claburn / The Register (2019), Google’s reCAPTCHA favors — you guessed it — Google — coverage of the signed-in-account advantage and the proxy and VPN score penalty.
Google (2014), Are you a robot? Introducing “No CAPTCHA reCAPTCHA” — the Advanced Risk Analysis lineage that v3’s invisible scoring grew out of.
EDPB (2023), French SA fines Cityscoot €125,000 — the regulatory decision touching reCAPTCHA consent.
Hunton Andrews Kurth (2023), CNIL issues €125,000 fine against e-scooter rental company — analysis of the reCAPTCHA consent finding and the €25,000 Article 82 portion.
Friendly Captcha, reCAPTCHA v3 guide — independent summary of the signal categories and the privacy criticism.
Friendly Captcha, Google reCAPTCHA migration 2026 — the timeline for folding classic reCAPTCHA into reCAPTCHA Enterprise on Google Cloud.