Skip to content

The history of credential stuffing: from password reuse to OpenBullet

· 19 min read
Copyright: MIT
Credential stuffing wordmark with a combo-list arrow feeding a login form

Every credential-stuffing attack rests on one boring fact about people: they reuse passwords. The attacker does not need to guess anything, break any crypto, or find a zero-day. Someone else already did the hard part by leaking a username-and-password list from one site, and the attacker simply replays those pairs against a hundred other sites to see where the same human used the same password. It is the laziest serious attack on the internet, and it works because the alternative, remembering a different password for every account, is something almost nobody does.

That gap between how authentication is supposed to work and how people actually behave has been understood for at least fifteen years. What changed over that time was not the idea. It was the tooling, the supply of stolen credentials, and the defenses. The attack went from a hand-rolled script that a single operator ran against a single target to a packaged product with a config marketplace, captcha-solver integrations, and proxy rotation built in. This post traces that arc. It starts with the 2011 breaches that made password reuse measurable and gave the attack its name, moves through the breach-dump era that supplied the ammunition, walks through Sentry MBA and the OpenBullet generation that turned attacks into point-and-click work, and ends with the defensive stack that answered it.

2011: a name for an old problem

The mechanics predate the name. People reused passwords in the 1990s, and anyone with two leaked password lists could cross-reference them. What 2011 added was proof of scale and a label.

In June 2011, two breaches landed close together: Sony and Gawker. Troy Hunt, who would later build Have I Been Pwned, pulled the dumps and cross-referenced the accounts that appeared in both. The numbers were grim. Across a cleaned sample of 37,608 Sony accounts, 92 percent of users who held accounts on two separate Sony systems had used the same password on both. More tellingly for what came next, among the 88 email addresses that appeared in both the Sony and Gawker dumps, two-thirds reused the same password across the two unrelated companies. That last figure is the whole attack in one sentence. If a credential works on Gawker, there is a meaningful chance the same person reused it on Sony, and on their bank, and on their email.

The same year, the term got coined. Sumit Agarwal, then a Deputy Assistant Secretary of Defense at the Pentagon and soon a co-founder of Shape Security, named the pattern he was watching hit public-facing military login pages: waves of automated login attempts using credentials stolen elsewhere. He called it credential stuffing. Shape Security launched in 2011 around defending against exactly this class of automated attack, and the term stuck because nothing else described it precisely. It was not brute force. Brute force guesses passwords. Credential stuffing does not guess anything; it replays pairs that are already known to be valid somewhere.

That distinction matters and it is the reason credential stuffing earns its own line in the threat taxonomy. OWASP later catalogued it as OAT-008 in its Automated Threat Handbook, defined as mass login attempts used to verify the validity of stolen username-and-password pairs, explicitly noting that it “does not involve any brute-forcing or guessing of values.” The neighboring entry, OAT-007 Credential Cracking, is the brute-force cousin that does guess. The two get confused constantly. The handbook keeps them apart because the defenses differ: rate-limiting one account’s password attempts stops cracking, but stuffing spreads one attempt each across millions of accounts and sails under that limit.

Credential cracking (OAT-007) many guesses, one account acct pass1 pass2 ... passN Credential stuffing (OAT-008) one known pair, many accounts user@x:pw bank mail shop game *Cracking pours many guesses into one account and trips a per-account rate limit. Stuffing spreads one known-good pair across many sites, which is why simple lockout policies miss it.*

2012-2016: the breach-dump era

A credential-stuffing attack is only as good as its input list. Through the early 2010s, the supply of those lists grew from a trickle to a flood, and the single event that defined the era was the LinkedIn breach.

LinkedIn was hacked on June 5, 2012. The company initially confirmed that around 6.5 million password hashes had been stolen and forced a reset on those accounts. The hashes were unsalted SHA-1, which is to say they were barely protected; published analyses at the time had the bulk of them cracked within days. That was the version of the story the public knew for four years. Then, in May 2016, a seller using the handle “Peace” listed 117 million LinkedIn email-and-password pairs on a dark-web market. The 2012 breach had never been 6.5 million accounts. It was more than 100 million, and most of that dataset had been sitting in private hands, getting cracked and reused, the entire time.

The LinkedIn dump matters to this story for two reasons. First, the scale: a hundred million real email-and-password pairs is enough fuel to attack every consumer site on the internet. Second, the four-year delay between breach and disclosure. Credentials do not expire when they are stolen; they expire when the user changes the password, which most users never do unprompted. A 2012 credential was still landing hits in 2016 because the human behind it had not touched the password in four years.

The same period produced a parade of nine-figure dumps: Dropbox, Yahoo, Adobe, MySpace, Tumblr. Each was raw material. Attackers stopped thinking in terms of single breaches and started thinking in terms of aggregated lists, deduplicated across sources, with cracked hashes turned back into plaintext. The term of art for the product is a combo list: a file of email:password lines, one per row, ready to be fed into a stuffing tool. The breach is the harvest; the combo list is the refined fuel.

breach hashes crack + merge dedup sources combo list user:pass stuffing tool + proxies the breach is the harvest; the combo list is the fuel *A breach yields hashes; cracking and merging turns many breaches into one deduplicated combo list; the tool replays it through a proxy pool.*

By the time the dust settled on this era, the aggregation went meta. In January 2019, Troy Hunt documented Collection #1, a dump that briefly appeared on the MEGA file host and then on a hacking forum. It was not a single breach. It was a compilation: 87 gigabytes across more than 12,000 files, containing 2,692,818,238 rows that deduplicated down to 1,160,253,228 unique email-and-password combinations, drawn from over 2,000 separate source breaches. Unique email addresses came to 772,904,991; unique passwords to 21,222,975. Collection #1 was followed by Collections #2 through #5. It was credential stuffing’s ammunition depot, assembled and sold as a single product, and it made the cost of a combo list approach zero.

That zero-cost input is the economic engine of the whole problem. When fuel is free and the success rate is one to two percent, even a feeble conversion is profitable at volume, because the volume is effectively unlimited. The arithmetic is the reason the attack never went away. If a scraping or stuffing operation’s economics only need a couple of percent to clear, the defender’s job is to push that rate low enough that the proxy and solver bills exceed the take.

How the early tooling worked: Sentry MBA

For the attack to scale, someone had to package it. Through the mid-2010s the packaging of choice was Sentry MBA, a Windows account-checker that became, by Shape Security’s count, the most widely used credential-stuffing tool of its day. Its official Twitter account dates to July 2013; it was actively advertised on dark-web forums from late 2014. The tool itself was attributed to an author using the alias “Sentinel,” with later modifications credited to “Astaris.”

The thing that made Sentry MBA more than a script was the config file. A config is a target definition: it tells the tool how to talk to one specific site’s login form. Out of the box Sentry MBA knows nothing about any particular site. Load a config for, say, a retailer, and the tool now knows the login URL, the exact shape of the login POST request, and, the part that does the real work, which strings in the response mean success, which mean failure, and which mean the account is locked or the IP is banned. Get the success and failure keywords right and the tool can test thousands of credentials and sort the responses into hits and misses automatically.

Three other inputs round it out. A combo list supplies the credentials to test. A proxy list supplies the IP addresses to spread the attempts across, so that a target watching for many logins from one IP sees instead a thin trickle from each of thousands of IPs. And an OCR module handles simple image captchas, reading the distorted text well enough to clear the weaker challenges of the era. The config encodes how many requests to send per proxy before rotating, so the operator can tune the attack to stay under a target’s rate thresholds.

The reason this design was potent is that it split the labor. Writing a good config takes skill: you have to reverse-engineer the login flow, find the right keywords, handle the captcha. Running a config takes none. So a small number of skilled config authors could supply a large number of unskilled operators, and a market formed around exactly that. Sentry MBA configs were traded and sold by the thousand; one analysis counted over a thousand configs in circulation. The operator bought a config for a target, fed in a combo list and some proxies, and clicked go. That division between the people who build the attack and the people who run it is the template every later tool copied.

2019 onward: OpenBullet and the modular generation

Sentry MBA aged out. Its config format was rigid, its captcha handling could not keep up with reCAPTCHA, and the tool was closed. What replaced it was OpenBullet, and the shift in design tells you what changed about the defenses it was built to beat.

OpenBullet is open source, written for the .NET runtime, and it describes itself, accurately, as a general web-testing suite. You can use it to scrape, to run automated tests, to hammer an API. The same machinery that does legitimate testing does credential stuffing, which is precisely why it lives in a gray zone and why its repository carries a prominent warning that running it against sites you do not own is illegal. The capability is dual-use; the intent lives in the config.

The architecture moved from a fixed config to a scripting language. OpenBullet 2 builds attacks two ways: a visual block editor it calls the Stacker, and a scripting language called LoliCode. A config is now a small program. It issues HTTP requests, sets headers, posts credentials, and runs KEYCHECK blocks that inspect the response and classify the outcome. The vocabulary of that classification is the cracking community’s own: a successful login is a hit, a failure is a fail, and an ambiguous or partially valid result, an account that exists but needs more work, gets sorted as a custom or a to-check. The same hit-and-fail bucketing that Sentry MBA did with success-and-failure keywords, OpenBullet does with scriptable checks, only now the script can branch, retry, and call out to external services.

next combo user:pass REQUEST POST via rotating proxy KEYCHECK match response HIT FAIL TO-CHECK / custom valid login bad pair needs work loop over the whole combo list captcha hit? hand response to a solver service, then re-run KEYCHECK *The OpenBullet loop: post a credential through a rotating proxy, run KEYCHECK against the response, and sort the result into hit, fail, or to-check. The exact LoliCode here is illustrative, not runnable.*

Three capabilities defined this generation. The first is captcha outsourcing. OpenBullet 2 integrates natively with captcha-solving farms; published descriptions name 2Captcha and Anti-Captcha among roughly a dozen supported services. When a config hits a captcha, it ships the challenge to a captcha-solving pipeline, gets a token back, and continues. The captcha stopped being a wall and became a line item, billed per solve. The second is browser automation: OpenBullet can run plain HTTP requests for speed, or drive a real browser through Puppeteer or Selenium when a target’s defenses demand a JavaScript runtime, sometimes with stealth patches bolted on to dodge headless-Chrome detection. The third is the proxy layer, where a config can fan its attempts across thousands of IPs, increasingly residential and mobile proxies sourced to look like ordinary home users rather than datacenter ranges.

The config marketplace came along too, and with it a darker twist. Because OpenBullet configs are shared programs and the community trades them freely, the configs themselves became an attack surface. Security researchers documented a campaign in which malicious OpenBullet configs circulated through cracking communities carrying a hidden payload: each one included an identical function, dressed up as a routine to bypass Google’s reCAPTCHA, that actually installed a remote-access trojan on the operator’s own machine. The supply chain of stuffing tools got poisoned by the same kind of person who runs stuffing tools. There is a certain symmetry to credential thieves getting their own machines backdoored by the configs they downloaded to steal credentials.

The scale this reached

Numbers from the defensive side make the volume concrete. Akamai, sitting in front of a large share of consumer login traffic, counted 193 billion credential-stuffing attempts globally in 2020, of which 3.4 billion hit financial-services targets specifically. The single largest spike it recorded against one financial institution was 55,141,782 malicious login attempts in a concentrated burst. These are not the numbers of a niche technique. This is one of the highest-volume attack classes on the public internet, sustained year after year because the input is free and the tooling is a download away.

The success rate that makes those billions worthwhile is low and stable. Industry estimates have long put credential-stuffing conversion in the neighborhood of one to two percent: a million credentials yields on the order of twenty thousand working accounts. Low, but the attacker controls the numerator by buying more combo lists, and the lists are cheap. A two-percent hit rate on a billion attempts is twenty million compromised accounts, and the compromised account is rarely the end of the line. It feeds account-takeover fraud, gift-card and loyalty-point theft, and resale, where a verified working login is worth more than the raw credential because someone already paid the testing cost.

The defenses that grew up around it

Every layer of defense against credential stuffing is an answer to a specific property of the attack, and they stacked up over the same fifteen years.

The cleanest defense attacks the input. If the password the attacker is replaying is no good, nothing downstream matters. This is why breached-password screening became standard. NIST’s SP 800-63B, in its 2017 rewrite and again in the 2024-2025 revision, told services to stop forcing periodic resets and arbitrary complexity rules, and instead to screen new passwords against lists of known-breached values at the moment they are set. The practical tooling for this is Have I Been Pwned’s Pwned Passwords, which exposes a k-anonymity API: a client sends the first five hex characters of the SHA-1 of a candidate password, and the server returns every breached hash with that prefix, so the service can check membership without the password or its full hash ever leaving the client. Screen at creation, and a user simply cannot pick a password that is already in a combo list.

The defense that actually broke the economics, though, was multi-factor authentication. A replayed password is worthless if the login also demands a second factor the attacker does not have. Microsoft’s widely-cited internal analysis put MFA’s effectiveness at stopping more than 99.9 percent of account-compromise attempts. That number is the reason MFA is the first recommendation in every credential-stuffing prevention guide, including OWASP’s. The catch is friction and coverage: MFA prompts on every login annoy users, so many deployments only challenge on risk signals, which leaves a gap for stuffing to walk through on the unchallenged path.

That gap is where bot management lives, and it is the layer where credential stuffing connects to the rest of the anti-automation arms race. A stuffing run is a bot run, so the same signals that catch scrapers catch stuffers. OWASP’s own prevention cheat sheet names the techniques directly: device fingerprinting to recognize the same automated client across IP rotations, connection-level fingerprinting via JA3 and JA4 TLS signatures and HTTP/2 fingerprinting to flag clients whose network stack does not match the browser they claim to be, IP reputation and ASN-based residential-proxy detection, and captchas reserved for suspicious attempts with their solve rates monitored. The whole bot-mitigation industry, from the vendor that coined the term to its competitors, sells login-flow protection as a core product. When OpenBullet bolts on a captcha solver and residential proxies, it is buying its way past exactly these checks, one signal at a time.

where each defense intercepts the attack 1. breached-password screening kills the input 2. bot management: fingerprint, IP rep, JA4 raises the cost 3. risk-based captcha challenge slows the loop 4. second factor (MFA / passkey) breaks replay a replayed password that clears layers 1-3 still dies at layer 4 *Defenses stack inward. Screening removes bad passwords from circulation; bot management and captchas raise the per-attempt cost; the second factor makes a correct password insufficient on its own.*

The newest layer aims to retire the password entirely. Passkeys, the consumer-facing name for FIDO2 and WebAuthn credentials, replace the shared secret with a private key that never leaves the user’s device. There is no password to breach, so there is nothing to put in a combo list, so there is nothing to stuff. The FIDO Alliance’s October 2025 figures had passkey sign-ins succeeding 93 percent of the time against 63 percent for password-based flows, which matters because the historical objection to phishing-resistant auth was that it was too clunky for normal people to use. The honest caveat is the downgrade attack: as long as a relying party still accepts a password as a fallback, an adversary-in-the-middle can try to force the login down to the password path and stuff it there. Passkeys remove the stuffing target only on the accounts and flows where the password is genuinely gone, not merely optional.

What the arc actually shows

The striking thing about fifteen years of credential stuffing is how little the core idea changed and how much the surrounding machinery did. The 2011 insight, that two-thirds of people reuse a password across two unrelated sites, is still true; password managers and passkeys have dented it but not erased it. What got industrialized was everything around that insight. The combo list went from a hand-built cross-reference of two breaches to a 1.16-billion-pair compilation sold as a product. The tooling went from a closed account-checker with rigid configs to an open, scriptable suite with captcha-farm integrations and proxy rotation as first-class features. The captcha went from a wall to a per-solve fee. None of that is innovation in the attack; it is supply-chain maturation around a constant.

The defenses tell the same story in reverse. They did not get cleverer about guessing which login is malicious so much as they made a correct password matter less. Breached-password screening shrinks the usable input, bot management taxes each attempt, and the second factor makes the password necessary-but-not-sufficient. A login flow that screens passwords at creation, fingerprints the connection, challenges on risk, and demands a passkey has, at each layer, removed one of the things that made stuffing cheap. The attack still runs. The Akamai counters still tick past a hundred billion attempts a year. But the gap between a stolen credential and a usable account is wider than it was, and it widens on the same slow cycle that password reuse refuses to die on. The day the last fallback password is gone, the combo list is just a list of email addresses.


Sources & further reading

Further reading