How account-takeover detection works: velocity, device, and impossible-travel signals
A login with the correct password is the hardest fraud to catch, because nothing about it is technically wrong. The credentials match. The account exists. The session that follows is, byte for byte, a valid authenticated session. The attacker is not exploiting a bug; they are using the front door with a key they bought. This is account takeover, and it is the reason a login form needs more than a password check behind it. The detection problem is not “is this password right.” It is “is the person holding this password the person who owns the account,” and the password alone cannot answer that.
So detection systems answer it sideways, with signals the password does not carry. Where did this login come from, and is that plausible given where the last one came from? Is this the device this account always uses, or a new one? Is this one login among millions hitting the same endpoint with the same shape? None of these is conclusive on its own. A real user travels, buys a new laptop, switches to a VPN for work. The whole craft of account-takeover detection is combining weak, individually-ambiguous signals into a risk score that is right often enough to act on, while wrong rarely enough that legitimate users are not constantly locked out. This post walks the main signal families in turn. We start with credential-stuffing velocity, the volumetric tell that precedes most takeovers. Then device-fingerprint continuity, the question of whether this is the same machine. Then impossible travel and geovelocity, the geographic plausibility check and its sizable false-positive problem. Then how those scores feed risk-based step-up authentication, and why NIST is explicit that a fraud signal is not an authentication factor. We close on what passkeys do to the whole model.
The supply chain behind the login
Account takeover does not begin at your login form. It begins at someone else’s breach. The fuel is a credential dump: a file of email-and-password pairs lifted from a site that got compromised, or scraped off a machine by infostealer malware. The 2025 generation of those tools has names, and they recur in incident reports: Lumma, RedLine, StealC, Acreed. They scrape browser password vaults, saved cookies, and autofill data off an infected machine, and the harvested material flows into combo lists, deduplicated and sorted by service or geography, sold fresh because fresh lists validate at higher rates than stale ones.
The attack works because people reuse passwords. Verizon’s 2025 Data Breach Investigations Report puts a number on it that is worth memorizing: in the median case, only 49 percent of a person’s passwords across services are distinct. Roughly half are repeats. So a breach of any one site is a probabilistic partial breach of every other site that shares users with it. The attacker takes a combo list of a million pairs, replays each one against your login endpoint, and even a hit rate as low as 0.1 percent yields a thousand validated accounts. That replay at scale is credential stuffing, and it is the volumetric event that detection systems see first. The same DBIR analysis of single-sign-on provider logs found that credential stuffing accounted for a median 19 percent of all authentication attempts. Roughly one in five logins hitting an enterprise SSO was an attack.
The detail that matters for detection is that this is run with tooling built for it. OpenBullet and its forks (SilverBullet, OpenBullet 2) load the combo list, rotate through a pool of residential proxies, retry on failure, solve or farm out CAPTCHAs, and log the hits, all driven by a per-target “config” file that encodes the specific login flow. Configs trade on the same markets as the combo lists and the proxy subscriptions. The economics are covered in depth in credential stuffing mechanics and the credential-stuffing toolchain; what matters here is the shape it leaves on the wire.
Velocity: the volumetric tell
Velocity is the simplest signal and the first to fire. A credential-stuffing run is, definitionally, a lot of login attempts in a short window. The detector watches rates, and it watches several at once, because attackers spread their traffic to dodge the obvious ones.
The naive rate is failed logins per IP per minute. An attacker running flat-out from one address trips it instantly, which is exactly why nobody runs flat-out from one address anymore. Residential proxy pools let a campaign spread a million attempts across tens of thousands of IPs, so each individual address looks almost idle. The counter-move is to watch rates that survive IP rotation. Failed attempts per account, regardless of source IP, catches a single account being hammered. The global ratio of failed to successful logins across the whole endpoint catches the campaign in aggregate: normal traffic holds a stable success ratio, and a stuffing run floods the endpoint with mismatched pairs, so the ratio craters even as volume spikes. Attempts against accounts that do not exist is a strong tell, because a clean combo list of high-entropy emails will contain many addresses that were never customers, and a legitimate user almost never types an email that has no account.
*No single counter is enough. Per-IP rates are defeated by proxy rotation; the per-account, no-such-account, and global-ratio counters survive it. The campaign shows up as a collapse in the aggregate success ratio even when every individual IP looks idle.*Velocity, then, is not one threshold but a small panel of them, scored together. And the attackers know it, which produces the “low and slow” variant: one attempt per IP per hour, paced to stay under every per-source rate limit. That defeats velocity counters by design, and it is the reason velocity cannot be the only line. A run slow enough to evade rate-watching still has to come from somewhere, on some device, and that is where the other signal families earn their place. The defensive rate-limiting algorithms themselves (token bucket, sliding window, GCRA) are their own subject, treated in rate-limiting algorithms for defense.
Device-fingerprint continuity: is this the same machine
If velocity asks “how many,” device fingerprinting asks “who.” The idea is to derive a stable identifier for the device behind a login from attributes the browser exposes, then check whether the account’s current login comes from a device it has used before. A recognized device lowers risk. A brand-new device on an account that always logs in from the same laptop raises it.
The raw material is the broad fingerprinting surface that anti-bot systems already collect. User-agent and the client-hint headers. Screen resolution, color depth, device-pixel-ratio. Timezone and locale, which double as a geographic cross-check. The canvas and WebGL renderer signatures. Installed-font enumeration. Hardware concurrency and device memory. Audio-stack quirks. OWASP’s credential-stuffing guidance names fingerprintjs2 as a client-side library for collecting these, and the open-source-versus-commercial split in what these libraries actually measure is the subject of FingerprintJS internals. Each attribute is low-entropy on its own; combined, they narrow a device to a small population, sometimes to one.
There is a hard caveat OWASP states plainly, and it shapes how the signal can be used: all of this is provided by the client, so all of it can be spoofed. An attacker running an anti-detect browser (Undetectable, Hidemium, and the like) presents a different, plausible fingerprint per session, defeating naive “have I seen this exact device” matching. That is why fingerprint continuity is rarely used to block. OWASP is explicit: it is not practical to simply block attempts that do not match an existing fingerprint, because legitimate users get new devices and clear state constantly. The fingerprint is a risk input, not a gate. A match buys trust; a mismatch buys scrutiny, usually a step-up challenge rather than a hard denial.
The more powerful version of the signal layers a second, harder-to-spoof fingerprint underneath the JavaScript one: the connection fingerprint. OWASP gives the canonical example. If the user-agent header and the device fingerprint say “mobile browser,” but the TLS and HTTP/2 fingerprint of the connection says “Python script,” the request is almost certainly forged, because those two layers are produced by different parts of the stack and an attacker has to get both right simultaneously. TLS fingerprinting (JA3 through JA4) and HTTP/2 fingerprinting are detailed in TLS fingerprinting and HTTP/2 fingerprinting. The value for account-takeover detection is that the connection layer is far harder to make consistent with a spoofed application layer than either is to fake alone, so the mismatch between them is a high-confidence tell that JavaScript fingerprinting alone cannot produce.
*A spoofed application layer is cheap; making the connection layer agree with it is not. The detection is the contradiction between what the JavaScript claims and what the TLS handshake reveals, which an attacker must defeat on both layers at once.*Impossible travel and geovelocity
The geographic check is the one most people have heard of, usually under the name “impossible travel.” Take two consecutive logins for an account. Resolve each IP to a latitude and longitude. Compute the great-circle distance between them (the shortest path over the surface of the earth, the Haversine calculation), divide by the time elapsed between the logins, and you get an implied travel speed. If that speed exceeds what a human could plausibly achieve, the second login is suspicious, because the same person cannot be in both places.
The textbook example: a login from San Francisco at 9:00 a.m. and another from Singapore at 9:45 a.m. is over 8,000 miles in 45 minutes, an implied speed north of 10,000 miles per hour. No one travels that fast, so one of those two sessions is not the account owner. Vendor implementations commonly draw the line around commercial-jet cruising speed, flagging implied speeds above 500 to 600 miles per hour, with the exact threshold tuned to the user base. The signal is intuitive and it catches a specific, valuable case: an attacker logging in from a different continent than the victim, minutes apart.
*Two logins, a distance, and an elapsed time give an implied speed. Above a jet-cruise ceiling, the same human cannot have made both, so one session is not the owner. The whole signal rests on the IP resolving to a true location, which is where it gets fragile.*Here is the problem, and it is a big one. IP geolocation is not the truth. It is a guess about where an IP lives, and a VPN, a corporate proxy, a mobile carrier’s CGNAT egress, or a cloud relay can place a perfectly legitimate user thousands of miles from their body. A user who finishes a session on home Wi-Fi and reconnects through a corporate VPN whose exit node is in another country has just generated textbook impossible travel without doing anything wrong. Apple’s iCloud Private Relay and similar services do this routinely. So raw impossible travel, used as a hard block, produces a flood of false positives, and a system that locks out real users is worse than useless because people learn to route around it or abandon it.
The mature implementations handle this by refusing to treat the geo jump as conclusive, and by cross-checking it against device continuity. Fingerprint’s approach is instructive: it ties the geo signal to a stable visitor identifier built from device, network, and behavioral attributes that survives the user clearing cookies or switching to incognito. If the same visitor ID appears at both distant locations, that pattern is consistent with one person on a VPN, and the right response is a step-up prompt, not a block. If two different visitor IDs appear in an impossible timeframe, that is a much stronger takeover signal, because it is two distinct devices, not one device behind a relay. The geovelocity number is the trigger; the device-continuity check is what keeps the false-positive rate survivable.
Microsoft’s Entra ID Protection is the best-documented production example, and its design choices are revealing. It actually splits the idea into two separate detections. “Impossible travel,” sourced from Microsoft Defender for Cloud Apps, fires on activity from geographically distant locations inside a window shorter than the travel time between them. “Atypical travel,” a distinct detection, identifies two sign-ins from distant locations where at least one is also unusual for that user given past behavior. Both are calculated offline, not in real time, which tells you something: the geographic check is expensive and noisy enough that Microsoft runs it as a batch detection against history rather than blocking the sign-in synchronously. The atypical-travel algorithm explicitly ignores known false-positive sources, including VPNs and locations regularly used by others in the same organization, and it carries an initial learning period of the earliest of 14 days or 10 logins before it will fire on a new user at all. You cannot flag a deviation from a baseline you have not built yet.
That learning-period point generalizes to every behavioral signal in this post, and it is worth pausing on. Per-account models have a cold start. A brand-new account has no history, so there is no “usual” device, location, or time to deviate from. Entra’s “unfamiliar sign-in properties” detection, which fires in real time on a sign-in whose IP, ASN, location, device, browser, or tenant subnet are unfamiliar for that user, also holds new users in a learning mode (minimum five days, dynamic beyond that) before it activates, and can send a user back into learning mode after a long gap of inactivity. The cold-start problem is general to behavioral detection and is treated on its own in the cold-start problem in behavioral biometrics.
Scoring: combining the signals
No serious system acts on any single signal. It collects all of them per login and combines them into a risk score, because each is individually ambiguous and the combination is what carries information. A new device is unremarkable on its own; people buy laptops. A login from a new country is unremarkable; people travel. A login from a new device, in a new country, at an unusual hour, against an account that just appeared in a fresh breach dump, following a spike in failed attempts on that same account, is a different matter entirely. The signals are conditionally informative: each one shifts the probability, and the combination can cross a threshold that no individual signal approaches.
This is why mature account-takeover programs are described as risk-scored rather than rule-based. A rule (“block logins from new countries”) generates too many false positives to survive contact with real users. A score weights the new-country signal alongside device continuity, velocity context, IP and ASN reputation, the time-of-day baseline, and the behavioral profile, and only acts when the aggregate crosses a line. Entra exposes this as named risk levels (low, medium, high) per detection, and the leaked-credentials detection is a useful contrast in confidence: it is always rated high, because it represents a verified match between the account’s actual password hash and a credential found in a breach corpus, not a heuristic. Most signals are heuristics. A confirmed breach match is closer to ground truth, and the score reflects that.
The infrastructure-reputation inputs deserve a note, because attackers spend real money to defeat them. IP and ASN reputation downgrades trust for addresses associated with hosting providers, known proxy networks, Tor exit nodes, and ranges with a history of attack traffic. That is exactly why credential-stuffing operations buy residential proxies: an IP that belongs to a real consumer ISP, ideally in the victim’s own country, sidesteps the datacenter-ASN penalty and the geographic mismatch at once. The arms race over distinguishing residential proxy traffic from genuine residential traffic is its own deep topic, covered in how anti-bot vendors detect residential proxies and ASN reputation. For account-takeover scoring, the relevant fact is that IP reputation is a strong signal against cheap attacks and a weak one against well-funded ones, which is another reason it is one input among many rather than a gate.
Behavioral signals are the layer that catches what the network and device layers miss, and increasingly they run after authentication, not just at the login moment. Continuous session analytics watch an authenticated session for impossible travel mid-session, sudden privilege escalation, or off-hours API patterns, and can evict a session immediately rather than waiting for the next login. The premise behind continuous evaluation is that the most expensive account takeovers happen after a valid authentication: the attacker has the real password and the real one-time code (phished, or relayed through an adversary-in-the-middle proxy), so the login itself is clean and the only remaining tell is how the session behaves. Mouse dynamics, keystroke rhythm, and touch behavior can flag that the human driving a session is not the account owner even when every credential checked out. That whole field, and the vendors who built it, is the subject of behavioral biometrics in fraud detection.
Risk-based step-up: what the score buys you
A risk score is only useful if something acts on it, and the action is where account-takeover defense meets authentication policy. The pattern is risk-based, or adaptive, authentication: a low-risk login passes silently, and a higher-risk login triggers step-up, an additional verification proportional to the risk. A new device on a known account might get an email-link confirmation. A high-risk score might force full re-authentication, an MFA prompt, or a temporary block pending the user’s confirmation. The friction is applied where the risk is, instead of charging every user the cost of a hard challenge on every login.
There is a precise and easily-missed point in the NIST guidance about where these fraud signals sit relative to authentication, and getting it wrong leads to weak designs. NIST SP 800-63B is explicit: indicators of potential fraud may be used before or during authentication, but doing so “does not impact or change the AAL of a transaction or substitute for an authentication factor.” A device fingerprint match, a familiar IP, a clean geovelocity check, none of these is an authentication factor. They are evidence about risk; they are not proof of identity. A system that treats “this is a recognized device, so skip MFA” as equivalent to a second factor has quietly downgraded its own assurance level, because a recognized device can be a stolen or spoofed device. The fraud signal decides whether to demand a factor. It is not itself a factor. That distinction is the whole reason the architecture has two separate layers.
NIST also pins down the session-lifetime side of step-up, which is where risk scoring meets reauthentication policy. At AAL2, the reauthentication overall timeout should be no more than 24 hours, with an inactivity timeout of no more than 1 hour; within that window, after inactivity, the user may reauthenticate with a single factor plus the session secret. At AAL3 the bounds tighten to a 12-hour overall timeout and a 15-minute inactivity timeout, and reauthentication requires the same full multi-factor cryptographic authentication as the initial login, no shortcuts. The guidance also now requires that verifiers offer at least one phishing-resistant option at AAL2. Those numbers are the ceiling on how long a hijacked session can live before the system forces a fresh proof, and a risk-based system tightens them further when the score is elevated: a session that started clean but began throwing impossible-travel or behavioral anomalies mid-flight gets its timeout cut or gets evicted outright.
*The score does not authenticate. It decides how hard to make the user prove identity. Low risk passes; medium demands a real factor; high blocks or forces full re-auth. The factor is the proof; the score is only the trigger.*The thing to keep straight is that the challenge has to be a real one. A step-up that resends a one-time passcode over SMS to a number the attacker has SIM-swapped, or pushes an MFA approval the victim will tap out of fatigue, is friction without security. Adversary-in-the-middle phishing kits relay the victim’s MFA in real time, which is exactly why NIST pushes phishing-resistant options. The step-up is only as good as the factor behind it, which brings us to the credential that removes the attack surface entirely.
What passkeys do to the model
Every signal in this post exists because passwords are reusable shared secrets that leak. The combo list, the velocity spike, the geovelocity check, the device-continuity score, all of it is machinery to catch an attacker who holds a valid password they should not have. Passkeys remove the premise. A passkey is a public-private key pair bound to the origin and held in the user’s authenticator; the private key never leaves the device and is never transmitted, so there is nothing to phish, nothing to dump from a breach, and nothing to stuff. The credential is scoped to one site by design, which means a breach of one site yields nothing replayable against another. Where a passkey is the credential, the entire credential-stuffing attack has no surface to land on, and the elaborate detection stack above is solving a problem that no longer exists for that login.
That is the direction the standards are pushing. NIST’s requirement that AAL2 verifiers offer a phishing-resistant option, and the broader move toward WebAuthn-based credentials, is the slow structural fix. But it is slow, and it is partial. Billions of existing accounts are still protected by reusable passwords. Account-recovery flows fall back to email and SMS, which are themselves takeover targets and often the softest path in. Adoption will take years, and during those years the detection signals are what hold the line on the accounts that still depend on a secret someone else might already be selling. The realistic posture is not one or the other. It is passkeys closing the surface where they are deployed, and velocity, device, and geovelocity scoring covering everything that still runs on a password, with the score deciding when to demand the stronger factor.
The number that frames the whole problem is the 49 percent. Half of a typical person’s passwords are reused from somewhere else, which means the supply of stuffable credentials refreshes with every breach and will not run dry on its own. As long as that holds, a login with the correct password stays the hardest fraud to catch, and the only honest answer is to stop trusting the password to identify anyone and to read everything around it instead: how fast the attempts arrive, what machine they come from, whether the geography is possible, and whether the human behind the session moves like the person who owns it.
Sources & further reading
- Microsoft (2026), What are risk detections? — Microsoft Entra ID Protection — the full catalogue of sign-in and user risk detections, with the atypical/impossible-travel split, the 14-day/10-login and 5-day learning periods, and the real-time-vs-offline labels.
- Fingerprint (2024), How to detect impossible travel and stop suspicious logins — the great-circle/speed calculation, the ~500-600 mph threshold, the SFO-to-Singapore example, and the visitor-ID continuity cross-check for VPN false positives.
- OWASP (2024), Credential Stuffing Prevention Cheat Sheet — MFA as primary defense, fingerprintjs2, the connection-fingerprint mismatch example (mobile UA vs Python script), and why blocking on fingerprint mismatch is impractical.
- Castle (2024), Credential stuffing attacks: anatomy, detection, and defense — velocity signals (failed-login spikes, success-ratio collapse, no-such-account hits), fingerprint reuse, the OpenBullet/SilverBullet toolchain, and the low-and-slow variant.
- Verizon (2025), 2025 Data Breach Investigations Report — the 49%-distinct-password median, the 19% median credential-stuffing share of SSO authentication attempts, and stolen credentials as a leading breach vector.
- NIST (2024), SP 800-63B: Digital Identity Guidelines, Authentication and Authenticator Management — AAL2/AAL3 reauthentication timeouts, the phishing-resistant-option requirement, and the explicit rule that a fraud indicator is not an authentication factor.
- Darknet.org.uk (2026), Credential stuffing in 2025: how combolists, infostealers and account takeover became an industry — the Lumma/RedLine/StealC/Acreed infostealer families, combolist structure, OpenBullet configs, and residential-proxy rotation.
- Dark Reading (2021), Credential stuffing reaches 193 billion login attempts annually — Akamai’s measured global attempt volume and the year-over-year growth in the financial sector.
- Abnormal (2025), Inside the engine: how behavioral AI deconstructs modern ATO attacks — post-authentication signals: mailbox-rule changes, OAuth grant anomalies, new-device registration, and per-identity baselining via model ensembles.
- Microsoft (2026), Investigate risk with Microsoft Entra ID Protection — the investigation workflow behind the risk detections, including atypical-travel and malicious-IP triage.
Further reading
HUMAN's collective signal network: how cross-customer telemetry feeds detection
Traces how HUMAN Security aggregates signals across its customer base, from its White Ops ad-fraud heritage to the Satori threat-intel disruptions, and what the collective-defense model can and cannot see.
·19 min readBehavioral biometrics in fraud detection: mouse, keystroke, and touch dynamics
Traces what mouse, keystroke, and touch dynamics actually measure, how continuous authentication differs from a login check, how BioCatch and BehavioSec build the profile, and why behavioral data sits in a regulatory grey zone.
·23 min readThe cold-start problem in behavioral biometrics
Behavioral models need history to judge a user, so first-session and new-account verdicts are structurally weak. Traces how vendors bootstrap with population models, device signals, and progressive trust, and where each fallback breaks.
·18 min read