The credential-stuffing toolchain: from breach dump to validated account

A breach dump is inert. A text file with eighty million email:password rows does nothing on its own, and the rows are mostly stale by the time they leak. The value is not in the file. It is in the machinery that turns the file into a list of accounts that actually open right now, on a specific site, behind whatever login defenses that site happens to run. That machinery is a supply chain, and it is unusually well organized for something that exists to commit fraud.

This post walks that supply chain end to end, from the moment a dump appears to the moment an attacker holds a known-good credential for your service. The interesting part is not any single tool. It is how the work splits across specialists who never have to meet, how each layer has its own market with its own prices, and how the whole thing is held together by configuration files that encode a target’s login flow the way a recipe encodes a dish. We will look at where credentials come from now that infostealers have largely replaced breach dumps, at the tooling that does the testing (OpenBullet and the family of forks around it, and the older Sentry MBA that set the template), at the two markets that make the tooling work at all (residential proxies and CAPTCHA-solving farms), and at the economics that decide whether any of it is worth doing. The defensive reading is at the end, but the through-line is simpler than the tooling makes it look: credential stuffing is cheap because the password-reuse rate is high and every expensive part of the job has been turned into a commodity someone else operates.

What credential stuffing actually is, and what it is not

OWASP files credential stuffing as OAT-008, defined as “mass log in attempts used to verify the validity of stolen username/password pairs.” The definition draws a sharp line against OAT-007, credential cracking. Cracking guesses. It runs a dictionary or a brute-force search against one account and hopes a password falls out. Stuffing does no guessing at all. Every pair it submits is a real password that worked somewhere else, and the only question the attack answers is whether the same person reused it here. That distinction matters for defense, because the countermeasures diverge. Rate limits and lockouts blunt cracking nicely. Against stuffing they help less, because the attacker is not hammering one account a thousand times; they are trying one password against a million accounts, once each, from a million different IP addresses.

The economics follow directly from the reuse rate. Industry write-ups put the success rate of a stuffing run somewhere between one and three percent of the combos tested. That sounds low until you multiply. A run of a million combos at a two percent hit rate yields roughly twenty thousand working logins, and the input cost the file plus the infrastructure can be under a thousand dollars. The arithmetic has not changed much in a decade. What has changed is where the million combos come from and how cleanly the rest of the pipeline has been packaged.

*The five stages of a stuffing operation. The orange box, validation, is where configs, proxies, and CAPTCHA solvers converge, and it is the only stage that touches the target's servers.*

Where the credentials come from now

For most of the 2010s the input was a breach dump: someone exfiltrated a site’s user table, the hashes got cracked or the passwords were stored in plaintext, and the result circulated as a named dump. Compilations followed. “Collection #1,” surfaced in 2019, aggregated billions of rows from hundreds of older breaches into one searchable mass. The trouble with breach dumps, from an attacker’s point of view, is that they age. Passwords get rotated, the dump gets indexed by services like Have I Been Pwned, and defenders start screening against the same lists. A two-year-old dump has a low and falling hit rate.

The supply has shifted to infostealer logs, and the shift changes the economics. An infostealer is malware that runs on a victim’s own machine and harvests whatever the browser has saved: cookies, autofill, and the plaintext passwords sitting in the browser’s credential store. There is no hashing to crack because the browser already decrypted them. The output is fresh, it is tied to a live session, and it often arrives with the session cookies that let an attacker skip the login entirely. Through three quarters of 2025, one vendor’s telemetry counted 13.6 billion email-and-password pairs and 29.7 billion passwords tied to those emails, which works out to more than two passwords per address, a direct measure of the reuse the attack depends on.

The infostealer market itself churns. Lumma (LummaC2) dominated credential-log volume on Russian Market through late 2024, accounting for the large majority of fresh logs on that platform, until a coordinated law-enforcement action in May 2025 seized over 2,300 of its domains. The vacuum filled fast. Acreed moved into the top slot within weeks, ahead of older strains like RedLine, Raccoon, StealC, and Vidar. Genesis Market, which sold not just credentials but the full browser fingerprint and cookie set needed to impersonate a victim’s session, was disrupted in 2023, and that takedown is part of why Russian Market grew into the dominant shop it is now. The pattern repeats: a marketplace or strain falls, demand does not, and a replacement takes its place inside a single quarter. The takedowns are real and they hurt, but the demand curve under them barely moves.

The combo list and its normalization

Whatever the source, the raw material has to be turned into a combo list, the canonical input format every stuffing tool reads: one username:password pair per line, often with the colon as the separator. Stealer logs do not arrive in that shape. They arrive as per-machine folders full of files. Turning them into a usable combo list is its own step, sometimes its own product. A parser walks the logs, pulls credentials out, and reshapes them into the flat user:pass form.

The valuable work is not extraction, it is sorting. A generic combo list of random email:pass rows is worth little because most rows are for sites nobody is targeting. The premium product is filtered: combos that contain a banking domain, a streaming domain, a specific retailer, or a particular country’s email providers. These are called URL-login-password lists, ULPs, when the source URL is preserved alongside each credential, which lets a buyer pull exactly the rows whose stored login matches the site they intend to attack. A list pre-filtered to a target raises the hit rate far above the one-to-three-percent baseline, because every row is a person who at least had an account on a similar service. That filtering is why region-specific and service-specific lists command higher prices than bulk dumps.

The validation tool: configs, blocks, and the stack

The tool that does the actual testing has had two dominant generations. The first was Sentry MBA. Shape Security’s March 2016 write-up made it the canonical example, and its three-part structure became the template everything since has copied. You bring a config file, a combo list, and a proxy list. The config encodes the target: the URL of the login page, the field markers that tell the tool where the username and password go, and the rules for recognizing a successful versus a failed login from the server’s response. The combo list supplies the credentials. The proxy list supplies the IP diversity that keeps the run from being blocked on volume. Sentry MBA shipped with built-in OCR to chew through the simple text CAPTCHAs of the era and could spoof the User-Agent and Referer headers to look more like a browser. The config was the clever part, because it let a non-expert attack a specific site without understanding the site, as long as someone else had already written the config.

OpenBullet is the second generation and the current standard, now in its OpenBullet 2 incarnation. It is open source, and its authors present it as a “webtesting suite” for legitimate automation and security testing. That description is real in the sense that the same machinery genuinely does QA work; it is also exactly what lets the tool stay on GitHub while the configs that target real banks circulate on Telegram. OpenBullet 2 builds an attack as a stack of blocks, assembled either in a drag-and-drop visual editor it calls the Stacker or written in its scripting language, LoliCode, which transpiles down to C#. The block types map cleanly onto an HTTP login flow.

An HTTP-request block fetches the login page or posts the form. A parse block extracts whatever dynamic values the page hands out, the anti-CSRF token, a nonce, a hidden form field, so the next request can echo them back the way a real browser would. A keycheck block is the heart of the thing: it inspects the response and decides, against rules the config author wrote, whether this combo is a hit, a failure, a ban, or a retry. That verdict is the entire product of the attack. Everything else exists to get a clean keycheck on as many combos as possible. The config is distributed as an .opk file, which is a ZIP archive with the script and settings inside; the broader ecosystem of forks (SilverBullet, Anomaly, and others) uses parallel formats like .svb, .anom, and .loli, often encrypted so the config author can sell access without leaking the logic.

*The login flow as a config sees it. This is a description of mechanism, not a runnable config: real targets vary the field names, token plumbing, and success signals, which is exactly the per-target knowledge a config author sells.*

OpenBullet can run blocks as a plain HTTP client, which is fast and cheap because it executes no JavaScript, or it can drive a real browser through Puppeteer or Selenium when the target’s defenses require a rendered page and an executed challenge script. The Puppeteer path uses an anti-detection fork to blunt the most obvious automation tells. This is the fork in the road that decides the cost of an attack. An HTTP-only run is nearly free per request and trivially parallel; a browser-driven run pays what is sometimes called the headless-browser tax, because every session now needs a full browser process with all the CPU and memory that implies. Defenders push attackers toward the expensive path on purpose, which is the entire logic of server-side versus client-side bot detection: force the work into a place where it costs the attacker real money.

A word on what these tools do not get you. A config is target-specific and perishable. When a site changes its login form, renames a field, adds a challenge, or rotates a header that the keycheck depends on, the config breaks and somebody has to rewrite it. That maintenance burden is why config authorship is a paid specialty rather than a one-time act, and it is the single most effective place for a defender to inject friction, because a change you make once forces a rewrite on every attacker running against you.

The first market: residential proxies

A stuffing run that comes from one IP address dies immediately. Any competent login endpoint rate-limits per IP, and twenty thousand attempts from a single datacenter address gets that address blocked inside seconds. The attack only works if the requests appear to come from thousands of unrelated, ordinary users. That is what a proxy network sells, and the value of the network is precisely how ordinary its addresses look.

Datacenter proxies are cheap and nearly useless against a defended target, because the IP ranges owned by hosting providers are well known and carry bad reputation scores. The product that matters is the residential proxy: an exit node that is a real consumer device on a real ISP, so the request inherits the trust that defenders extend to home broadband. Anti-bot vendors have built whole detection layers around catching these anyway, which is its own subject covered in how anti-bot vendors detect residential proxies and ASN reputation, but the baseline reality is that a residential IP clears reputation filters that a datacenter IP never will. Imperva’s 2024 report put a quarter of bad-bot traffic as originating from residential ISPs, and other measurements of malicious sessions run higher.

The uncomfortable question is where the residential IPs come from, and the honest answer is that a large share come from people who never agreed to sell them. The cleanest documented case is 911 S5. Krebs on Security and the US Treasury laid out how its operator, Yunhe Wang, built a network of millions of residential exit nodes by bundling proxy backdoors into free VPN apps (MaskVPN, DewVPN, PaladinVPN, ProxyGate, ShieldVPN, ShineVPN) that users installed without understanding their machine was now a relay. The FBI’s 2024 figures described it as one of the largest such services ever, and the fraud routed through it included a confirmed loss exceeding 5.9 billion dollars against US pandemic-relief programs. The operator was sanctioned and arrested in May 2024. The service had already shut down once in July 2022 and resurfaced under a new name before being taken down again, which is the same falls-and-reappears pattern the credential markets show.

*The routing that makes residential proxies valuable. The target's reputation check passes because the address genuinely belongs to a home broadband subscriber, who is usually unaware their device is the exit.*

Not every residential network is built on compromised machines. Several proxy vendors source IPs through paid SDKs embedded in free apps, where the user nominally agrees to share bandwidth in exchange for a free product, and the consent is buried in a license agreement. The line between that arrangement and outright malware is thin and frequently crossed, and from the target’s side the distinction is invisible: a residential exit is a residential exit, whether the homeowner consented or not. For the operational tradeoffs between residential, datacenter, and mobile exits, see residential vs datacenter vs mobile proxies; the relevant point for stuffing is just that the proxy market is what converts a single attacker into the appearance of a crowd.

The second market: CAPTCHA-solving farms

The other defense a stuffing run hits is the challenge: a CAPTCHA, a behavioral check, or a managed challenge from an anti-bot vendor. A config that hits one of these and cannot answer it stalls on every single combo, and the run is dead. So the toolchain integrates a third-party solver, and the solver is a market of its own with published price lists.

OpenBullet ships native integration for roughly a dozen CAPTCHA farms, 2Captcha and Anti-Captcha among them, configured by dropping in an API key. The mechanism is the same across vendors. The config’s CAPTCHA block packages the challenge, the site key and page URL for a token challenge, or the image for an image challenge, and ships it to the solver’s API. The solver returns a token or an answer, the config splices it into the login POST, and the flow continues. The solver does not need to be on the same machine, in the same country, or even aware of what site is being attacked. It is a clean API boundary, which is exactly what makes it a commodity.

The pricing tells you how cheap this layer has become. On 2Captcha’s published rates, a simple text or image CAPTCHA runs roughly fifty cents to a dollar per thousand solves; reCAPTCHA v2 is listed at one to about three dollars per thousand; reCAPTCHA v3 sits around $1.45 to $2.99 depending on the score requested; Cloudflare Turnstile and DataDome challenges are listed at $1.45 per thousand; and the hardest interactive puzzles, FunCaptcha from Arkose Labs, scale up to as much as fifty dollars per thousand for the hard variants. Two delivery models sit behind those numbers. Some services route challenges to rooms of human workers paid per solve, a cents-per-thousand wage; others run machine-learning solvers and reserve humans only for what the models miss. The economics of CAPTCHA at this point are an arms race the defender is partly losing on cost, a tension covered in the economics of anti-bot vendors, because a challenge that costs the defender real engineering to deploy can be answered by the attacker for a fraction of a cent.

*Solver pricing as of the cited 2Captcha rate card. Most challenges cost the attacker single-digit dollars per thousand; the Arkose interactive puzzle is the deliberate exception, priced an order of magnitude higher because it resists automation.*

The division of labor, and why it makes the system durable

The thing that makes this supply chain hard to break is that no participant has to be good at all of it. The infostealer operator harvests logs and never runs a login attempt. The log parser turns folders into combo lists. The config author reverse-engineers one target’s login flow and sells the .opk to people who never look inside it. The proxy vendor runs the residential network. The CAPTCHA farm runs the solver. The “cracker” who actually launches the run buys the other four as services, points OpenBullet at a combo list, and collects hits. And the hits themselves are a product: validated, known-good credentials get resold to a separate class of buyer who does the cashout, the account drain or the fraud, and who never touched the stuffing tool at all.

This specialization is what the markets quietly assume. Recorded Future’s analysis put the entry cost of a stuffing operation around 550 dollars, with a plausible return of twenty times that, against a backdrop where the price of a single compromised account fell from over ten dollars to roughly one or two as supply ballooned. Tooling prices were never the barrier: across the older generation, STORM was free, SNIPR sold around twenty dollars, Black Bullet went for thirty to fifty, Sentry MBA configs traded for five to twenty dollars each, and automated credential shops took a ten-to-fifteen-percent cut of every sale. The tool is cheap. The config is cheap. The marginal cost of a validation attempt, proxy plus solver, is fractions of a cent. The only thing that scales the profit is volume, and every layer of the chain has been optimized to supply volume.

Law enforcement understands the structure and attacks the chokepoints: the marketplaces (Genesis, Russian Market’s suppliers), the infostealer infrastructure (Lumma’s 2,300 domains), the proxy networks (911 S5). Each takedown is real and each one hurts. None of them have changed the shape of the demand curve, because the chain reconstitutes around the gap. A strain falls and another takes its volume the same quarter. A market goes down and its sellers migrate. The durability is not in any one node; it is in the fact that the work has been decomposed into pieces small enough that any one piece is replaceable.

What this means for defense

The defensive reading falls out of the structure. Because the attack depends on password reuse, the highest-value control is the one that makes a stolen password useless on your site: a second factor, or a passkey that has no shared secret to reuse in the first place. Everything else is friction that raises the attacker’s cost without removing the underlying economics. That friction still matters, because the entire attack is a cost calculation, but it should be understood as cost-raising rather than prevention.

The validation stage is where your controls land, and the structure tells you where they bite. Per-IP rate limiting is necessary and insufficient, because residential proxies defeat it by spreading the run across thousands of clean addresses; the useful versions of rate-limiting algorithms for defense key on something harder to rotate than an IP, and combine with account-takeover detection signals like velocity across accounts, device consistency, and impossible-travel. A config breaks when the login flow changes, so a defender who rotates field names, token plumbing, or success signals forces a config rewrite and imposes maintenance cost on every attacker at once. A challenge that an attacker can outsource to a solver for a fraction of a cent is weak; a challenge that resists automation, the Arkose-style interactive puzzle, costs the attacker enough that the run’s margin starts to matter. Pushing the attacker from the cheap HTTP path onto the expensive browser path through challenges that demand a real rendered, fingerprinted client, the subject of credential-stuffing mechanics and the detection work around it, attacks the one input the attacker cannot commoditize away: their own compute.

The honest conclusion is that you are not going to break the supply chain. It is too well distributed, too cheap to reconstitute, and fed by a password-reuse rate that no individual site controls. What you can do is move yourself up the cost curve until the one-to-three-percent hit rate against your specific login stops clearing the bill. The attacker is running a margin business on a commodity input. The defense is to be the target where the margin goes negative, and then let them spend their fractions of a cent somewhere else.

Sources & further reading

OWASP (2015), OAT-008 Credential Stuffing — the formal definition and its distinction from OAT-007 credential cracking.
Shape Security (2016), A look at Sentry MBA — the config / combo list / proxy list template that later tools copied, including built-in OCR and header spoofing.
F5 Labs / John Miller (2021), Credential Stuffing Tools and Techniques, Part 1 — OpenBullet’s request, parse, and keycheck blocks and the four-phase attack flow.
Castle (2024), Open Bullet 2: the preferred credential stuffing tool — LoliCode, the Stacker UI, the Puppeteer anti-detection fork, and native CAPTCHA-farm integration.
OpenBullet (2024), OpenBullet 2 documentation / FAQ — primary source on the .opk/.loli config formats, LoliCode-to-C# transpilation, and the webtesting positioning.
Recorded Future (2019), The Economy of Credential Stuffing Attacks — the $550 entry cost, ~20x return, per-account price collapse, and per-tool pricing.
SpyCloud (2025), The new age of combolists — the shift from breach dumps to infostealer logs and the 2025 credential-volume figures.
ReliaQuest (2025), The infostealer pipeline: how Russian Market fuels credential attacks — Russian Market’s rise after Genesis, and the Lumma-to-Acreed succession.
Imperva (2024), Five key takeaways from the 2024 Bad Bot Report — bad-bot traffic share, residential-ISP origin share, and ATO trends.
Brian Krebs (2024), Treasury Sanctions Creators of 911 S5 Proxy Botnet — Yunhe Wang, the VPN-bundled residential network, and the pandemic-relief fraud routed through it.
SecurityWeek (2022), FBI Warns of Proxies and Configurations Used in Credential Stuffing — the FBI’s description of config contents and the residential-proxy preference.
2Captcha (2026), Captcha prices — the per-thousand solver rate card cited for each challenge type.