Skip to content

Credential stuffing mechanics: combo lists, the reuse problem, and scale

· 22 min read
Copyright: MIT
The words credential stuffing in monospace with an orange underline and a combo-list arrow into a login form

An attacker holds a file with a million email-and-password lines in it. None of those lines were stolen from you. They came from some unrelated forum that got breached two years ago, dumped on a Telegram channel, and merged into a larger collection. The attacker points that file at your login form and walks away. A few hours later, somewhere between five thousand and twenty thousand of those lines have logged into accounts on your site, and your authentication logs show nothing but successful logins from people typing the correct password. No password was guessed. Nothing was cracked. The credentials were already valid; they just belonged to your users on a site that was never yours to protect.

That is the whole trick, and it is worth sitting with how little of it is technically clever. There is no exploit in the usual sense, no buffer overflow, no injection, no privilege escalation. The vulnerability is a human one, distributed across the entire internet, and it has a name that predates the attack: people reuse passwords. Credential stuffing is the industrialized exploitation of that single fact. This post is about the mechanics of that industry, from the breach that feeds it to the scale that makes a two-percent hit rate worth the trouble.

The sections that follow build up in order. First the root cause, password reuse, with the numbers that quantify it. Then combo lists, the data structure the whole attack runs on, and where they come from. Then the distinction from brute force, which matters more than it sounds because the two attacks fail to different defenses. Then the economics: why a success rate that rounds to zero is still profitable, worked through with real volume figures. Then a walk through the 23andMe breach as a case study in what the math produces. And finally the defenses that actually bite, and the one structural reason the attack will not go away on its own.

The reuse problem is the whole attack

Credential stuffing has exactly one prerequisite, and it is not technical. A user has to have used the same password on the breached site and on the target site. If passwords were unique per service, a breach of site A would leak credentials that work nowhere except site A, and a dump of site A’s logins would be worthless against site B. The attack exists because that is not how people behave.

The 2025 Verizon Data Breach Investigations Report put a number on the behavior by pulling it from infostealer malware logs, which is about as direct a measurement as you can get. Looking at the passwords a single infected user had stored across different services, the median user’s passwords were only 49% distinct from one another. Read that the other way: in the median case, more than half of a person’s passwords are repeats of passwords they use somewhere else. That is not a tail of careless users dragging down an otherwise-careful population. That is the middle of the distribution.

The same report found that stolen credentials were the initial access vector in 22% of breaches it analyzed, the leading vector for the second year running, and that 88% of attacks against basic web applications involved stolen credentials. When the report’s authors looked at single-sign-on provider logs, credential stuffing accounted for 19% of all authentication attempts on a median day. Roughly a fifth of the login traffic hitting an SSO endpoint, in the median, is someone trying credentials that were stolen elsewhere.

Why people reuse is not mysterious. A working adult has dozens of accounts and a finite memory. Unique strong passwords for all of them require either a password manager, which most people still do not use, or a memorable system, which usually degrades into one base password with small variations. The variation does not help as much as users think; an attacker who has your password from one site can try password1, password2, and password! cheaply. But even without variations, raw verbatim reuse is enough to keep the attack alive, because the 49%-distinct figure says half of all passwords are exact repeats.

There is a second-order effect worth naming, because it widens the blast radius of any single breach. A user’s email address is itself a stable, near-universal identifier. The same address keys their bank, their retailer, their airline, their social accounts. So a breach does not just leak a password, it leaks a password attached to a globally portable username. When the breached site used the email as the login identifier, which most consumer sites do, the attacker does not have to guess who the person is anywhere else. The combo list line is already in the format every other login form expects. That is why the email-password pair, rather than a site-specific username, is the unit of trade: it is immediately replayable against an unbounded set of targets without any reconnaissance.

Reuse is also sticky over time, which matters because combo lists have a long shelf life. A password leaked in a 2019 breach is still useful in 2025 if the user never changed it, and most users never change a password they were not forced to change. Mandatory rotation, the old corporate policy of expiring passwords every ninety days, has been quietly abandoned by NIST and most modern guidance precisely because it pushed users toward predictable, incrementing passwords that were easier, not harder, to guess. The result is that a credential, once leaked, tends to stay valid for years. The attacker is not racing a clock. The combo list ages slowly.

One user, one password, reused Forum (breached) hunter2 Combo list line user@x:hunter2 Bank login Email login Retail login The attacker never touched the bank, email, or retail site. The password did the travelling. Verizon DBIR 2025: median user's passwords were only 49% distinct from each other. *Reuse is what lets a credential stolen from one site validate on an unrelated one. The combo list is just the transport.*

Combo lists: the data structure of the attack

The file the attacker points at your login form is a combo list. The structure is trivial: one credential pair per line, usually username:password or email:password, sometimes with extra colon-separated fields the attacker ignores. A combo list can hold a few thousand lines or, in the largest collections, billions. It is plain text, it compresses well, and it is traded the way any other commodity data is traded.

Combo lists are assembled from breaches, and the assembly is most of the work. When a site is compromised and its credential database is stolen, the passwords are usually hashed rather than stored in plaintext, so the dump is not immediately usable. The cracking step that turns hashes back into plaintext is a separate operation, and it is where weak hashing (unsalted MD5, fast SHA-1) versus strong hashing (bcrypt, scrypt, Argon2) makes the difference between a dump that cracks overnight and one that mostly resists. Once cracked, the recovered plaintext pairs get normalized, deduplicated, and merged with pairs from other breaches into ever-larger aggregations.

The canonical example of that aggregation is Collection #1, which surfaced around January 2019 and was catalogued by Troy Hunt, who runs Have I Been Pwned. It was 87 gigabytes across roughly 12,000 files, totalling about 773 million unique email addresses and 21 million unique passwords, assembled from more than 2,000 prior breaches. The passwords were in plaintext; the cracking had already been done. And unlike most such collections, which are sold, Collection #1 was briefly handed around for free, which is the worst case for defenders because it widens the pool of people who can run the attack. Collections #2 through #5 followed and were larger still.

A combo list on its own is inert data. Turning it into an attack needs three more things, and the classic credential-stuffing tools bundle exactly those three inputs. Shape Security’s 2016 writeup of Sentry MBA, which was the dominant tool of that era, described the three-file model plainly: a config file that tells the tool how to navigate the target site’s login (the URL of the login page, the names of the form fields, what a success versus a failure looks like in the response); a proxy file, a list of IP addresses, usually compromised endpoints and botnet nodes, to route requests through so the attempts appear to come from many sources; and the combo list itself, the username-password pairs to test. Later tools, OpenBullet most prominently, generalized the config into a scriptable format but kept the same logical structure.

Three inputs feed the tool; the tool produces validated hits Combo list user:pass pairs Config login URL, fields Proxy list rotating IPs Stuffing tool replay each pair Target login form Model described in Shape Security's 2016 Sentry MBA writeup; OpenBullet generalized the config. *The combo list is the payload. The config makes the tool site-specific, and the proxy list spreads the requests so volume does not concentrate on one IP.*

Two details of the tooling matter for understanding defenses later. First, the proxy file is there specifically to defeat IP-based rate limiting; if a hundred thousand login attempts all came from one address, any half-decent defense would block it, so the attempts get spread across thousands of residential and compromised IPs that each carry a low, unremarkable request count. Second, those tools shipped with capabilities aimed squarely at the era’s defenses: Shape’s writeup noted Sentry MBA included optical-character-recognition for solving simple CAPTCHAs and could spoof the User-Agent and Referer headers to look like an ordinary browser. The attack has always co-evolved with the defenses, which is why the defenses that work today are not the ones that worked in 2016.

Why it is not brute force

The two attacks get conflated constantly, and the difference is not pedantic. OWASP classifies credential stuffing as OAT-008 in its Automated Threats catalogue and brute force, in the credential-attack sense, as OAT-007 Credential Cracking. The defining line between them, in OWASP’s own words, is that credential stuffing “does not involve any brute-forcing or guessing of values; instead credentials used in other applications are being tested for validity.”

Brute force, or credential cracking, attacks the password as an unknown. It picks a target account and tries candidate passwords against it: dictionary words, common patterns, character permutations, until something works or the attacker gives up. The search space is the space of possible passwords, which is enormous, and the per-account attempt count is high. That high per-account count is exactly what classic defenses catch. Lock an account after five failed attempts, and a brute-force run against that account dies at attempt six. Rate-limit failed logins per username, and the attack crawls to uselessness.

Credential stuffing inverts the geometry. The password is not unknown; the attacker already has it, paired with a specific username. Each pair gets tried once. There is no guessing loop against a single account, because the attacker is not trying to find one account’s password, they are testing a million known pairs to see which ones happen to be valid here. Per-account, the attack often makes exactly one attempt, which sails under any per-account failure threshold. The volume is real, but it is spread across a million different usernames, not concentrated on one. An account-lockout policy tuned to stop brute force does almost nothing against stuffing, because no individual account sees enough attempts to trip it.

Brute force (OAT-007) many guesses, one account acct pass1 pass2 pass3 pass4 ... Caught by account lockout Credential stuffing (OAT-008) one attempt each, many accounts user_a:known user_b:known user_c:known user_d:known Lockout never trips; needs volume + reuse defenses *Same login form, opposite shape. The defenses tuned for the left picture barely touch the right one.*

This is also why the detection signals differ. A brute-force run shows up as a spike in failed logins against specific accounts. A stuffing run shows up as a spike in total login volume with an unusual success-to-failure ratio, an abnormal distribution of source IPs, device fingerprints that do not match the accounts being accessed, and impossible-travel patterns where one account is accessed from geographically scattered addresses in a short window. The detection problem becomes a volume-and-velocity problem rather than a per-account problem, which is the territory covered in how account-takeover detection works. It is also why rate limiting against stuffing has to be designed differently from rate limiting against a single noisy client, a distinction worked through in rate-limiting algorithms for defense.

The economics of a two-percent success rate

The number that makes credential stuffing make sense is the success rate, and it is low. Shape Security’s measurement, repeated across the industry and consistent with the figure attributed to former Google fraud lead Shuman Ghosemajumder, is that most combo lists validate at roughly 1% to 2% against a given target. Take a million stolen pairs, point them at a popular site, and somewhere around ten thousand to twenty thousand will log in. Ghosemajumder’s framing was that a 2% rate means one million stolen credentials can take over twenty thousand accounts.

Ninety-eight percent failure would kill most attacks. It does not kill this one, for three reasons that compound.

The first is that the input is nearly free. Combo lists trade cheaply, and the largest ones, like Collection #1, have at times circulated at no cost. When the raw material costs close to nothing, a 2% yield is pure margin. The economics are not “is 2% high enough to justify the cost of the credentials,” because the credentials barely cost anything. They are “is 2% of a very large number still a large number,” and it is.

The second is that the attack parallelizes without limit. There is no sequential dependency between testing one pair and testing the next, so the only ceiling on throughput is how fast the target will accept requests and how many proxies the attacker has to spread them across. This is the same property that makes the attack hard to rate-limit: the requests are deliberately thin per source. An attacker with a botnet or a residential proxy pool can run millions of attempts in hours.

The third is the absolute scale of the credential supply, which turns a small percentage into an enormous absolute count. Akamai, watching login traffic across its CDN, reported on the order of 193 billion credential stuffing attempts globally in 2020, with about 3.4 billion of those aimed at financial services specifically. The trend did not reverse; Akamai later measured a 66% jump in credential-stuffing requests across its customer base from the fourth quarter of 2023 to the fourth quarter of 2024. At those volumes, even a fraction of a percent of validated logins is millions of compromised accounts, and each compromised account has resale or fraud value, whether through stored payment methods, loyalty points, gift-card balances, or simply as a verified account to relist.

Low rate, large base: the funnel still pays 1,000,000 stolen pairs tested ~ pairs whose owner reused ~20,000 valid logins (≈2%) Rate per Shape Security; Akamai measured ~193B attempts globally in 2020. *Twenty thousand working accounts from one cheap file. Multiply by the number of files and the number of targets to see why the attack scales.*

There is a refinement step that improves the yield further, and it explains why a stolen list often gets tested twice. The first pass is a validation run against a low-stakes target, sometimes the original breached site itself or a site the attacker does not actually care about, purely to separate the pairs that still work from the dead ones. The pairs that validate get split out into a smaller, cleaner list, sometimes called a private or validated combo, which is worth far more than the raw dump because its hit rate against the next target is no longer 2%, it is much higher. The economics of that resale, and of the broader market that prices credentials by freshness and validation status, sit alongside the mechanics covered in the economics of a scraping operation; the same cost structure that makes large-scale automated request volume cheap is what makes the stuffing run cheap. The supply chain has tiers, and a defender is rarely facing the raw dump. They are facing whatever survived someone else’s validation pass.

The supply side keeps refilling because new breaches happen constantly, and a fresh breach is the most valuable input of all. A combo list assembled from a breach that is six months old has already been run against the major targets by everyone else, so its 2% has largely been harvested. A list from a breach disclosed last week has not, which is why freshly stolen credential databases command a premium and why the window between a breach and its exploitation has compressed. The attack is not bottlenecked on cleverness. It is bottlenecked on supply, and the supply is growing.

What the math produced: 23andMe, 2023

The 23andMe breach is a clean illustration because the company itself, in its SEC disclosure and the subsequent regulatory investigations, confirmed it was credential stuffing and not a compromise of 23andMe’s own systems. No 23andMe database was breached to obtain the credentials. The credentials came from other sites’ breaches, recombined into combo lists, and replayed against 23andMe’s login.

The timeline matters. The credential-stuffing activity ran for roughly five months, beginning around late April 2023 and continuing into September, before the company became aware of it when stolen data was advertised for sale online in October 2023. Roughly 14,000 accounts were directly accessed through the reused credentials. That number, on its own, is the expected output of the math: feed enough breached pairs into a login form over five months, and a low single-digit percentage validate.

What made 23andMe notable was the amplification past those 14,000. The service’s DNA Relatives feature let users see and share data with genetic relatives, so each compromised account exposed not just its own data but profile information for the relatives linked to it. The 14,000 directly-accessed accounts cascaded into personal data exposure for approximately 6.9 million people, around 5.5 million through the DNA Relatives feature and roughly 1.4 million through a family-tree feature. The attacker only had to validate 14,000 logins; the application’s own sharing graph did the rest.

The post-mortem findings read like a checklist of the defenses this attack defeats. At the time, 23andMe did not require multi-factor authentication, so a valid password was sufficient for full access. The detection systems did not flag five months of sustained, anomalous login activity as an attack. The UK Information Commissioner’s Office, in its 2025 enforcement action, fined the company over its security failures, and the joint UK-Canada regulatory investigation centred on the absence of controls that would have caught or blunted a stuffing campaign of that duration. None of that involved a sophisticated adversary. It involved reused passwords, a combo list, patience, and an application that trusted a correct password completely.

The defenses that actually bite

Because the attack rides on valid credentials, the defenses split into two families: stop the attacker from submitting attempts at scale, and stop a valid credential from being sufficient on its own.

The single most effective control is the second kind. Multi-factor authentication breaks the core assumption of the attack, which is that a username-password pair is enough. OWASP’s prevention guidance calls MFA “by far the best defense against the majority of password-related attacks,” and cites Microsoft research suggesting it would have stopped 99.9% of account compromises. A validated stuffing hit gets the attacker past the password and then stalls at a second factor they do not have. The catch is adoption: MFA adds friction, many users decline it where it is optional, and SMS-based second factors have their own weaknesses. 23andMe did not require it, which is precisely why the password was enough.

The second high-value control attacks the supply side directly: refuse to let users set passwords that are already in a breach corpus. NIST SP 800-63B requires verifiers to compare prospective passwords against a list of values known to be commonly used or compromised, and Revision 4 hardened the earlier encouragement into a requirement. The practical implementation almost everyone reaches for is Have I Been Pwned’s Pwned Passwords, which holds over 900 million compromised password hashes. The clever part is how it is queried without leaking the password being checked. The client SHA-1 hashes the candidate password, sends only the first five hex characters of the hash to the API, and gets back every hash in the corpus that shares that five-character prefix, an average of around 478 hashes for the current dataset. The client then checks locally whether the full hash is in that returned set. The service never learns which password was being checked because it only ever saw a five-character prefix shared by hundreds of distinct passwords. That k-anonymity model, designed with Cloudflare, is what makes breached-password checking deployable without the check itself becoming a privacy hazard.

Checking a password without revealing it (HIBP k-anonymity) SHA1("hunter2") = F3BBBD66A63D4BF1747940578EC3D0103530E21F send only the first 5 chars GET /range/F3BBB server returns ~478 hash suffixes sharing prefix Client matches the full suffix locally. Server never sees which of the ~478 it was. Corpus: 900M+ compromised passwords. Prefix space: 16^5 ≈ 1,048,576 buckets. *The defense that drains the combo list of value, queried in a way that does not leak the password being checked.*

The first family of defenses, stopping attempts at scale, is harder because the attacker designed the tooling to evade it. Naive IP rate limiting fails against a proxy pool that keeps every source IP’s request count low, which is the entire reason the proxy file exists in the tool. Effective rate limiting has to operate on signals the attacker cannot cheaply rotate: the velocity of login attempts globally rather than per IP, the success-to-failure ratio, the consistency between a device fingerprint and the account it is accessing, and reputation data on the source addresses, which is where residential and datacenter proxy detection enters. OWASP’s guidance explicitly steers away from blunt per-IP blocking toward intelligence-driven mitigation that weighs multiple abuse signals before acting, partly because a single proxy IP may carry both attack traffic and legitimate users behind a NAT. The same document recommends device fingerprinting (naming the open-source fingerprintjs2 library) and CAPTCHA as additional friction, though it is candid that CAPTCHA has limits, and the OCR features built into the old tools are a reminder of how long that arms race has been running. CAPTCHA’s own long history of being defeated and rebuilt is its own subject, covered in the history of CAPTCHA.

The honest assessment is that no single control is sufficient, and the effective posture layers them. MFA blunts the value of a validated hit. Breached-password checks shrink the pool of accounts whose passwords are even in the combo lists. Bot-detection and rate-limiting raise the cost of running attempts at volume. Behavioral and velocity signals catch the campaigns that get through. Each layer leaks; together they move the economics enough that an attacker takes their combo list to a softer target. That is the realistic goal, not elimination but making yourself the more expensive option.

What does not change

Strip away the tooling and the credential stuffing problem is structural, and the structure is durable. As long as passwords are the primary credential, as long as the same human reuses one across many services, and as long as breaches keep feeding the supply, a breach of any site is a partial breach of every site that shares users with it. The attacker’s marginal cost per target approaches zero, while the defender has to hold the line everywhere at once. That asymmetry is the thing that does not go away.

What is shifting, slowly, is the password’s role itself. Passkeys and other phishing-resistant credentials remove the reusable shared secret entirely; there is no password to leak, so there is nothing to stuff. Where they are adopted, the attack simply has no surface. But adoption is partial and will stay partial for years, password fallback paths persist for account recovery, and billions of existing accounts are protected by exactly the reusable secrets the attack feeds on. The Verizon figure is the one to keep in view: in the median case, half of a person’s passwords are still repeats of passwords they use somewhere else. Until that number moves, the combo list keeps its value, and a file full of someone else’s stolen credentials remains one of the cheapest ways into your users’ accounts.


Sources & further reading

Further reading