Dependency confusion: how internal package names became an attack vector
Pick any large engineering organization and it has private packages. A logging helper, an auth client, a build of some internal SDK, all published to a registry that lives behind the corporate firewall and resolved by name in a thousand package.json and requirements.txt files. The name is not a secret. It leaks into public GitHub repos, into stack traces, into the minified JavaScript a CDN serves to the whole world. So here is the uncomfortable question that surfaced in early 2021: what happens if someone you have never met registers that exact name on the public registry, at a version higher than yours, with a script that runs on install?
For a while, at dozens of companies you have heard of, the answer was that their build machines fetched the stranger’s code and ran it. No typo, no phishing, no compromised maintainer account. The package manager did exactly what it was designed to do, which was to find the highest version of a requested name and install it. That behavior, applied across a registry that mixes public and private sources, is the whole attack. It has a name now: dependency confusion.
This post traces the technique from the inside. It starts with the 2021 research that proved it at scale, then takes apart the resolution logic that makes it work across npm, pip, RubyGems, and friends. From there it covers the bug-bounty fallout, the real-world incidents that followed (PyTorch’s torchtriton is the textbook case), and the namespace defenses that actually close the hole. The through-line is a single design assumption, baked into a generation of package managers, that a name maps to one trustworthy source.
The 2021 research that named it
In February 2021, security researcher Alex Birsan published a write-up describing how he had achieved code execution inside more than 35 companies, Apple, Microsoft, PayPal, Netflix, Shopify, Tesla, Uber, and a long tail of others. The mechanism was almost embarrassingly simple. He collected the names of packages that those companies used internally but had never published publicly, then published packages with those exact names to npm, PyPI, and RubyGems. When a build at one of those companies next resolved its dependencies, it pulled his package instead of the private one and ran the install hook.
The names came from places nobody thinks of as sensitive. A package.json embedded in a public GitHub repo. Internal package names that appear in the dependencies block of a JavaScript bundle shipped to browsers. Posts on forums where an engineer pasted a stack trace. Birsan describes scanning large numbers of hosts and pulling manifests out of build artifacts. None of this required breaching anything. The names were already out, scattered across the public internet by ordinary developer behavior.
The payload was deliberately tame, because this was authorized research paid out through bug-bounty programs. Instead of dropping a real backdoor, each package phoned home with enough information to prove it had executed: username, hostname, the current working directory, and the external IP. Birsan exfiltrated that over DNS, encoding the data as hex into hostnames so the lookups would reach an authoritative nameserver he controlled. DNS is a clean exfiltration channel for this because it leaves the sandbox even when outbound HTTP is firewalled, and the queries look like ordinary name resolution.
The numbers are worth stating plainly. The work earned over 130,000 dollars in bounties: 30,000 each from Apple, PayPal, and Shopify, and 40,000 from Microsoft, with most of those being the maximum the respective program would pay. About three quarters of the successful callbacks came from npm packages, which tracks with how aggressively Node projects pull in unscoped dependencies. To demonstrate the version-selection behavior to pip, he used an absurd version like 9000.0.0, far above anything the legitimate internal package would carry, so that the public copy always won the comparison.
What made the disclosure land was that it was not a vulnerability in any single product. There was no buffer to overflow, no injection to sanitize. The behavior Birsan exploited was documented, intended, and shared by most of the package managers in wide use. That is why the fix was never a single patch. It was a slow migration of defaults and conventions across multiple ecosystems, much of which is still in progress.
Why the installer picks the wrong package
To see why this works you have to look at how a package manager turns a name into a downloaded artifact. The reader has run npm install ten thousand times; the interesting part is what the resolver does when more than one source can answer for a name.
Start with the case that is genuinely npm’s own. The npm CLI has no built-in concept of split indexes. When it resolves a dependency it consults exactly one registry, the one configured for that package’s scope (or the default registry if the package is unscoped). So a plain npm install against the public registry is not, by itself, confusable. The danger appears once an organization stands up a private registry that proxies to the public one. The most common self-hosted option, Verdaccio, ships a default config whose catch-all rule sends unmatched package names upstream to npmjs:
packages: '**': proxy: npmjsThat single line is the hinge. When a build asks the private registry for internal-logger, the registry checks its own store, then, finding the name also exists upstream, considers the public copy too and serves whichever version is highest. Publish [email protected] on npmjs and the proxy hands it back. The CLI did nothing wrong; it asked one registry and got an answer. The registry’s proxy policy is what merged two namespaces into one.
pip and RubyGems get there by a different road, and arguably a more dangerous one, because the merge happens in the client and on by default. pip’s --extra-index-url is the classic trap. Teams reach for it to add a private index, assuming it means “also look here.” What it actually means is “search this in addition to PyPI, and install the highest compatible version found across all of them.” There is no way in that model to say this package must come from that index. The official PyPI copy and the private copy are peers, and the higher version number decides. RubyGems behaves similarly when it is configured with multiple sources: it queries each, then installs the highest version of the requested gem regardless of which source it came from.
There is one constraint that limits the blast radius, and it is worth being precise about. Version ranges still apply. If a manifest pins ^1.0.0, the caret operator accepts anything from 1.0.0 up to but not including 2.0.0. A public 99.0.0 falls outside that range and is rejected. So the attacker cannot always reach for an arbitrarily huge number; against a caret or tilde constraint they have to publish the highest version that still satisfies the range, for example 1.99.0 against ^1.0.0. That narrows the window but does not close it, and plenty of internal dependencies are referenced with loose or wildcard ranges that impose no ceiling at all.
The deeper point is that none of this is a parsing bug or a logic error. The resolver is doing version arithmetic correctly. The flaw is a missing concept: the tools had no first-class way to say “this name belongs to this source and nowhere else.” Everything downstream follows from that absence. If you have read our walkthrough of the event-stream and node-ipc incidents, you will recognize the pattern: npm’s install-time scripts turn “fetched a package” into “ran arbitrary code,” and dependency confusion is simply a new way to control which package gets fetched.
The install hook is the firing pin
It is worth separating two things that often get blurred. One is getting your package onto the build machine. The other is getting it to execute. Dependency confusion is entirely about the first. The second is supplied, for free, by the lifecycle hooks that package managers run during installation.
In npm, a package can declare preinstall and postinstall scripts that run automatically when the package is installed, before any application code has so much as required it. That is the firing pin. The victim does not have to import the malicious module, call a function from it, or deploy anything. The mere act of resolving and installing the dependency, which happens on every fresh npm install and in every CI job that does not have a warm cache, is enough to run the attacker’s code with the privileges of the build. Python’s packaging has historically offered equivalent execution surfaces through setup.py running at install time for source distributions.
This is why the build machine and the CI runner are the real targets, not the production app. A CI agent typically has network egress, cloud credentials in environment variables, access to other private registries, and a fresh checkout of source. It runs installs constantly and often with elevated trust. An install-time script that reads process.env, enumerates the filesystem, and exfiltrates the result is harvesting exactly the secrets that let an attacker move laterally. The same property shows up in cloud breaches that pivot off an over-trusted internal service; the SSRF attack that breached Capital One turned one request into instance credentials in much the same spirit, a small foothold cashed in for the keys around it.
The bug-bounty gold rush, and then the real attacks
The disclosure did two things at once. It paid Birsan handsomely, and it told every bug-bounty hunter on earth exactly how to find easy money. Within days of the February 2021 write-up, security firms reported floods of newly published packages on npm and PyPI bearing names that looked like internal corporate dependencies. Most were benign proof-of-concept callbacks from researchers racing to claim the same names Birsan had, hoping a target had not yet locked down. Sonatype and others tracked the surge. For a stretch, “publish a package named after a Fortune 500’s internal tooling and wait for a DNS callback” was a viable, if ethically gray, way to farm bounties.
The genuinely malicious use took a little longer to mature, and the clearest case is PyTorch’s. Over the holidays in December 2022, between roughly the 25th and the 30th, anyone who installed the PyTorch nightly build pulled a compromised dependency named torchtriton. PyTorch’s nightly channel published its own torchtriton on its dedicated package index. An attacker registered the same name on PyPI. Because pip treats PyPI as a peer source and the public copy carried a higher version, installs of the nightly resolved torchtriton from PyPI instead of from PyTorch’s index. The malicious package ran a binary at install time that read system information and specific files, then exfiltrated them over encrypted DNS queries to a domain under *.h4ck.cfd.
The remediation PyTorch chose is the canonical defensive move and worth remembering. They renamed the dependency from torchtriton to pytorch-triton, then registered a placeholder pytorch-triton on public PyPI so that the name could never again be claimed by someone else. Rename into a name you control, then squat your own name publicly. That second step is the part teams forget.
The technique has not aged out. In late May 2026, Microsoft’s security team documented a campaign of 45 malicious npm packages from three coordinated accounts that abused dependency confusion in a more deliberate way. Rather than wait for an unscoped name to confuse, the operators registered packages under organizational scopes that mirrored real internal corporate namespaces, nine of them, names shaped like @payments-widget and @data-science and several that referenced specific companies. Execution again rode in on npm postinstall, with no require() from victim code needed. An obfuscated stager then pulled reconnaissance payloads from a command-and-control host over HTTPS, collecting system information, hostnames, and environment variables. The tell that tied the three accounts to one operator was a hardcoded auth token reused as an X-Secret HTTP header across all of them. The shape of the attack had shifted from “guess the name” to “impersonate the namespace,” which is a direct response to scopes becoming the standard defense.
Scoping, namespaces, and the defenses that hold
The durable fix is to give names an owner that the resolver respects. On npm that is scopes. A scoped package looks like @acme/internal-logger, where @acme is the scope, and the registry enforces that only the account or organization owning the acme scope may publish under it. The protection has two halves. First, an outsider cannot publish @acme/anything to public npm at all, because they do not control the scope. Second, and this is the part you have to configure, you bind the scope to your private registry so the CLI never even asks the public registry about it. In an .npmrc that is one line:
@acme:registry=https://registry.internal.acme.example/With that in place, every @acme/* install request goes to the internal registry and nowhere else. There is no version race because the public registry is never consulted for that scope. The unscoped, proxying setup that made the original attack work is gone. This is why the npm guidance that came out of 2021 reduces, in practice, to: scope all your internal packages, pin the scope to your registry in a committed .npmrc, and for belt-and-suspenders, register your organization name as a scope on public npm so nobody else can take it.
The other ecosystems each grew their own version of “names have owners.” NuGet shipped the cleanest answer. The dependency-confusion vector there was assigned CVE-2021-24105, and the lasting fix is package source mapping, added in NuGet 6.0 and exposed in the Visual Studio options dialog from 17.5 onward. It lets you declare, per package or per ID pattern, exactly which feed each package must restore from, in nuget.config. NuGet also has ID prefix reservation, which lets an organization reserve a prefix like Acme.* on nuget.org so attackers cannot publish lookalikes under it. The two together give you a private-feed-only mapping plus a reserved public prefix, mirroring the npm “scope plus public squat” pattern.
Python’s story is the most cautionary, because the obvious-looking fix kept failing to land. The community attempt to extend the Simple Repository API was PEP 708, authored by Donald Stufft and created in February 2023. Its idea was to let repositories declare relationships between projects, a “Tracks” relationship signaled by a meta.tracks field in JSON (and pypi:tracks in HTML) so an installer could understand that a platform-specific wheel index legitimately extends the canonical PyPI project, plus an “Alternate Locations” mechanism for projects that intentionally live in several places. PEP 708 sat provisionally accepted for three years, gated on PyPI implementing it, at least one other repository doing so, and pip adding support. Those conditions were never all met, and it was rejected in April 2026. The practical advice for pip users did not wait on the PEP and has not changed: prefer --index-url to point pip at a single index you control, avoid --extra-index-url for anything sensitive, and where you must mix indexes, use a tool that supports index-restricted requirements (Pipenv deprecated unrestricted multi-index resolution for exactly this reason).
Microsoft’s white paper from 2021, 3 Ways to Mitigate Risk When Using Private Package Feeds, distilled the cross-ecosystem guidance into three moves that still hold up. Use a single private feed rather than wiring up several public and private sources side by side. Control the scope or namespace where your package manager supports it, so internal names cannot be served from public sources. And add client-side verification, a lockfile or a content hash check, so the build aborts if a dependency’s resolved source or content changes unexpectedly. That last one is the safety net under everything else. A lockfile such as package-lock.json or poetry.lock records not just the version but the resolved URL a package came from, so a committed lockfile that is actually enforced (npm ci, not npm install) refuses to silently swap a private package for a public impostor.
What the technique actually teaches
Five years after the disclosure, dependency confusion is not patched, because there is no single thing to patch. It is the visible symptom of an assumption that ran underneath a generation of package managers: that a name maps to one source you can trust. The tools were built to answer “what is the newest version of X,” not “whose X is this.” Every fix that stuck, scopes, package source mapping, ID prefix reservation, index-restricted requirements, is really the same correction applied in different syntax, which is to attach an owner to a name and make the resolver honor it.
The reason the attack keeps finding fresh victims is that the leak it depends on is woven into how teams work. Internal package names are not credentials; nobody rotates them, nobody treats a package.json in a public repo as a disclosure, and the name has to be shared widely inside the company to be useful at all. So the name gets out, and as long as some build somewhere resolves it without a scope or a source mapping, the public registry can win the version race. The May 2026 npm campaign is the tell that the defense is working and the offense has adapted: attackers stopped guessing unscoped names and started impersonating the scopes themselves, which only makes sense as a move once scoping has become common enough to be worth defeating.
There is a tidy way to check where you stand. Take the most boring internal package your build pulls, the logging shim nobody thinks about, and ask one question: if a stranger published that exact name to the public registry tonight at version 9000.0.0, does your next clean CI run fetch theirs or yours? If you cannot answer instantly and with a config file to back it up, the version race is still open, and the only thing standing between you and Birsan’s result is that nobody has bothered to run it against you yet.
Sources & further reading
- Alex Birsan (2021), Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies — the original research write-up, with the bounty figures, exfiltration method, and per-ecosystem behavior.
- Microsoft (2021), 3 Ways to Mitigate Risk When Using Private Package Feeds — the cross-ecosystem mitigation white paper covering .NET, Python, JavaScript, and Java.
- Include Security (2021), Dependency Confusion: When Are Your npm Packages Vulnerable? — precise analysis of the npm/Verdaccio proxy default and version-range constraints.
- npm Docs, About scopes — how
@scope/namepackages map to namespaces and which accounts may publish under a scope. - PyTorch (2022), Compromised nightly dependency chain between December 25th and December 30th, 2022 — the
torchtritonincident and the rename-plus-squat remediation, straight from the project. - Wiz (2023), Malicious PyTorch dependency ‘torchtriton’ on PyPI — independent technical breakdown of the payload and DNS exfiltration domain.
- Donald Stufft (2023), PEP 708 – Extending the Repository API to Mitigate Dependency Confusion Attacks — the Python proposal, its Tracks and Alternate Locations metadata, and its eventual rejection.
- Microsoft Learn (2026), Best practices for a secure software supply chain — NuGet’s guidance on package source mapping, ID prefix reservation, and lock files.
- Microsoft Security (2026), Malicious npm packages abuse dependency confusion to profile developer environments — the 2026 campaign impersonating internal organizational scopes.
- Sonatype (2021), Dependency hijacking software supply chain attack hits more than 35 organizations — early coverage of the copycat package surge that followed disclosure.
- MSRC (2021), CVE-2021-24105: Visual Studio Code npm-script extension — the NuGet/Microsoft tooling CVE assigned to the dependency-confusion vector.
Further reading
The event-stream and node-ipc incidents: npm supply-chain attacks dissected
Two npm supply-chain cases dissected: the 2018 event-stream maintainer handoff that smuggled a Copay wallet stealer through flatmap-stream, and the 2022 node-ipc protestware that wiped files in Russia and Belarus.
·22 min readSubresource Integrity and the supply-chain risk of third-party scripts
Traces how the integrity attribute verifies a third-party script against a cryptographic hash, what a compromised CDN it stops, the dynamic-resource gap it cannot close, and why adoption stayed in single digits.
·20 min readThe Polyfill.io supply-chain attack: how a CDN dependency went rogue
A single-incident deep dive into the June 2024 Polyfill.io attack: the February domain sale, the conditional payload injected into hundreds of thousands of sites, the evasion logic that hid it, and the takedown that followed.
·21 min read