Skip to content

Domain generation algorithms: how malware finds its C2 without hardcoded domains

· 20 min read
Copyright: MIT
The phrase DGA in monospace with a generated domain string and an orange accent

A hardcoded C2 domain is a single point of failure. Pull the registration, sinkhole the IP, push the indicator to every firewall on the planet, and the botnet is deaf. The malware keeps running on every infected host, but it has nothing left to phone. So the obvious question, the one a botnet operator asks early, is how you keep a callback channel alive when the defender can take any domain you name. The answer that stuck is to name no domain at all. Generate them. Generate thousands a day from a seed both ends already share, register one, and let the bots find you.

That is a domain generation algorithm. The mechanism is old, the math is trivial, and the defensive response it provoked turned into one of the few genuinely coordinated takedowns the industry has run. This post traces the idea from its first appearances in 2008, through Conficker (the family that made it a household word in the security community), into the detection problem it created. We’ll cover the generation mechanics and the seed types, the takeover-by-pre-registration move that defenders use against it, why dictionary DGAs broke the easy entropy heuristics, and where the machine-learning classifiers sit now. Mechanism first, then defense.

What a DGA actually is

Strip away the branding and a DGA is a seeded pseudo-random number generator with a domain-name codec on the output. Both the malware and the operator run the same code with the same seed, so both compute the same list of domains for a given time window. The bot walks the list, resolving each candidate until one answers. The operator only has to register a handful of those candidates, often just one, and stand up a server behind it. Everything else in the list returns NXDOMAIN, which is exactly the part defenders learned to watch.

The seed is the whole game. If the seed is public and predictable, anyone who reverse-engineers the algorithm can compute the same domains the malware will, which is what makes both takedown and detection possible. The most common seed is the current date. A bot reads its local clock, hashes the date with a fixed constant, and expands the result into N domains valid for that day. Tomorrow the list rolls over. MITRE catalogs the technique as T1568.002 under the dynamic-resolution family, and its description is blunt about the purpose: adversaries use DGAs “to dynamically identify a destination domain for command and control traffic rather than relying on a list of static IP addresses or domains.” That sentence is the entire threat model. Static lists die; generated lists regenerate.

The asymmetry is what makes it work. To keep the channel open the operator needs one live domain on any given day. To kill the channel the defender needs to neutralize every domain the algorithm can produce that day, which can be tens of thousands across a hundred registries. Registering one name costs a few dollars and five minutes. Blocking 50,000 across 110 TLDs requires the cooperation of 110 organizations, every day, indefinitely. Conficker is the family that forced the industry to actually try.

seed = date(2025-11-06) PRNG(seed) qwertasdfg.com NXDOMAIN zxcvbnmkjh.net NXDOMAIN plkmnbvcxz.org A 198.51.100.7 mnbvcxzlkj.info NXDOMAIN ...49,996 more Bot resolves the list in order. One name answers; the rest are noise. The operator registered exactly one. The defender must block all of them. *A date seed expands into the day's candidate domains. The operator registers one; every other lookup is an NXDOMAIN, and that flood of failed lookups is the first thing detection watches for.*

2008: the first generation

The technique did not arrive fully formed. Through 2008 several families converged on the same idea independently, which is usually a sign the design space was pushing everyone toward it. Kraken, a spam bot analyzed that year, generated callback domains rather than hardcoding them. Bobax did something similar. The family that gave researchers the cleanest look at the mechanism, though, was Torpig, the banking trojan that rode in on the Mebroot rootkit.

Torpig is worth dwelling on because in early 2009 a team at UC Santa Barbara used its DGA against it, and the writeup is one of the first detailed public accounts of taking over a botnet by out-registering its operator. Torpig used what the researchers called domain flux: the malware computed a weekly domain from the current date, and if that failed, fell back to a daily domain, and only if both failed used a hardcoded list. Because the weekly and daily names were a deterministic function of the date, the researchers could compute them ahead of time. They registered the domains the algorithm would generate for an upcoming window, pointed them at their own server, and the bots came to them. For ten days the team controlled the botnet. Roughly 180,000 infected machines reported in. The data that flowed to their sinkhole included credentials for thousands of accounts and over a thousand unique card numbers, which is the kind of number that gets a takeover paper read.

That takeover only worked because Torpig’s seed was the date and nothing else. Pure date seeding is the soft underbelly of the whole approach, and the operators who came after Torpig knew it.

2008-2009: Conficker makes it famous

Conficker (also Downadup) is the family that turned DGAs from a research curiosity into an industry-wide incident. It spread through the MS08-067 RPC vulnerability and removable media, and at peak it was on millions of machines. Variants A and B generated 250 domains per day across a handful of TLDs and checked them on a few-hour cycle. That was already enough to be annoying, but it was tractable. A coalition of researchers, registries, and registrars formed around the response. The press called it the Conficker Cabal; the formal name was the Conficker Working Group. They reverse-engineered the algorithm, computed the 250 daily names ahead of time, and pre-registered or blocked them. The callback channel went quiet.

Then variant C answered. SRI International’s reverse-engineering of Conficker C, published in March 2009, documented the change in detail. The DGA was rewritten to generate 50,000 domains per day, drawn from 116 suffixes covering 110 top-level domains, with names of four to ten characters. The activation date was hardcoded: April 1, 2009. Before that date the malware would, in SRI’s words, “enter a loop that sleeps 24 hours and then rechecks the date.” SRI read the redesign plainly as “a direct retort to the action of the Conficker Cabal, which recently blocked all domain registrations associated with the A and B strains.” The jump from 250 to 50,000 was not about contacting more servers. The bots still only queried a small subset each day. It was about arithmetic. To keep blocking the channel, defenders would now have to neutralize 50,000 names a day across 110 registries, forever.

Daily domains generated, per host log scale; both variants query only a fraction of what they generate 250 Conficker A / B 50,000 Conficker C across 110 TLDs *Conficker C's 200x jump in daily domains was a response to the working group's blocking of A and B, not a need for more servers. It was a deliberate cost shift onto the defenders.*

The working group pulled it off anyway, which is the part that still surprises people. ICANN coordinated with the registries behind all 110 TLDs and, by the April 1 activation, had cooperation in place across all of them. Some registries went further than daily blocking. CIRA, the .ca registry, locked every unregistered .ca name the algorithm was expected to generate over the following twelve months in one move. NIC Chile and NIC-Panama blocked the names supplied by the working group outright. The group kept supplying the 110 affected operators with an updated list every month. April 1 came and the doomsday headlines did not pay off, partly because the worm’s controllers, watched as closely as they were, mostly chose not to push their luck. The infections, on the other hand, lingered for years. Conficker was still being found on industrial and embedded systems long after its C2 was permanently boxed in.

The seed types and a taxonomy

Once the technique was common, researchers needed a way to talk about the variation. The reference work is the 2016 USENIX Security study from Fraunhofer FKIE that introduced DGArchive, a corpus built by reimplementing the algorithms of dozens of malware families and pre-computing every domain they could generate. That study analyzed 43 families and variants and gave the community a taxonomy that has held up.

Two axes matter. The first is how the algorithm turns a seed into characters. Most families are arithmetic: they run simple integer math on the seed (multiply, add, modulo, shift) and map the resulting numbers onto letters. This is the Conficker style, and it dominates the corpus. A second group is hash-based, where the seed is fed through a cryptographic hash like MD5 or SHA-1 and the hex output becomes the domain. A third group, the one that broke the easy detectors, is wordlist or dictionary based: instead of emitting random characters, the algorithm concatenates words from a built-in dictionary, so the domains look pronounceable. A fourth, smaller group uses permutation, shuffling a fixed seed string.

The second axis is time. Plohmann’s group split families by whether the generation depends on time at all and whether it is fully deterministic. A time-independent, deterministic DGA emits the same set regardless of the clock. A time-dependent, deterministic DGA (the common case) keys off the date, so the set is predictable if you know the date and the algorithm. The awkward category is time-dependent and non-deterministic, where the algorithm folds in an input that even a perfect reimplementation cannot predict ahead of time. That input is the operator’s escape hatch, and it has a name now: the dynamic seed.

DGA generation schemes arithmetic integer math on the seed, mapped to letters (Conficker) hash-based seed hashed (MD5/SHA-1), hex becomes the domain wordlist dictionary words concatenated (Suppobox, Matsnu) permutation a fixed string shuffled Time dependence time-independent, deterministic same set always time-dependent, deterministic keyed to the date (predictable) time-dependent, NON-deterministic dynamic seed (not pre-computable) *The taxonomy that has held up since 2016. The bottom row is where the arms race lives: if the seed is not predictable in advance, defenders cannot pre-register the domains.*

The dynamic seed problem

Pre-registration only works because the defender can compute the future. Torpig used the date. Conficker used the date. Reimplement the algorithm, read the calendar, and you know every domain the malware will try next month. The countermeasure, from the operator’s side, is to mix an input into the seed that the defender cannot know ahead of time.

A clean example is a seed derived from data the operator controls at run time, like a value pulled from a popular website that changes daily, or a number the operator can change at will and push to bots through some other channel. Once the seed includes that, a researcher with a perfect reimplementation still cannot generate next week’s domains, because the run-time input does not exist yet. The pre-registration takeover that worked on Torpig stops working. You can only sinkhole domains after the seed for that window is known, which usually means after they have already been queried in the wild, which means you are reacting rather than pre-empting.

Akamai’s research on dynamic-seed families surfaced a related wrinkle. When researchers tracked when DGA domains actually went live versus when the algorithm said they should, the live windows were wider than the math predicted. For Pushdo, domain activity spread across roughly fifty days on either side of the expected generation date. Necurs showed a tighter spread, around seven days in each direction with a spike about twelve days forward. The reading in that work is that operators deliberately register domains off-schedule to “frustrate or confuse security researchers” and stretch the lifespan of the channel. The defender who only watches the algorithmically exact day misses the names that went live a week early. So even where the seed is predictable, the timing is fuzzed.

Detection: NXDOMAIN floods and lexical features

The defender who cannot pre-register everything falls back to spotting the behavior. Two signals carry most of the weight, and the first is almost embarrassingly direct.

A DGA host resolves a long list of generated names and almost all of them do not exist. Each non-existent name returns an NXDOMAIN. A normal endpoint produces a steady trickle of NXDOMAINs from typos and stale links; a DGA-infected one produces a burst, dozens or hundreds of failed lookups in a short window, often to names that share no relationship to anything the user is browsing. The detection logic is exactly what it sounds like: parse DNS responses, count NXDOMAINs per host per window, and alert past a threshold. It is cheap, it runs inline, and it catches the loud families. Protective-DNS products lean on it heavily. The catch is that the burst only appears when the day’s live domain has not been found yet; a bot that resolves its C2 on the first or second try is quiet. And a host behind a resolver that rewrites NXDOMAIN to a landing page hides the signal entirely. For the resolver mechanics this leans on, the DNS resolution walkthrough covers how a query actually fails.

The second signal is the domain string itself. An arithmetic DGA emits something like qwzkxfbptr.com, which does not look like a name a human would register. You can quantify that. Shannon entropy of the character distribution runs high because the letters are near-uniform. The ratio of consonants to vowels is off; English words alternate, random strings clump consonants. The frequency of two- and three-character substrings (the n-grams) is wrong, because qz and xk are rare in real domains and common in generated ones. Score a name on length, entropy, vowel ratio, digit ratio, and n-gram frequency against a baseline of real domains, and the arithmetic families separate cleanly. Feed those features to a random forest and you get a fast, explainable classifier that runs at line rate. For years this was the standard.

Lexical features separate the easy families github.com entropy low . vowels present . common n-grams qwzkxfbptr.com entropy high . consonant clumps . rare n-grams (qz, xf) cityjulydish.net dictionary DGA: entropy and n-grams look almost legitimate *Entropy and n-gram scoring cleanly flag the random-character families. The dictionary DGA at the bottom is the problem case: it reads like English, so the cheap statistics barely move.*

Why dictionary DGAs broke the easy heuristics

Entropy detection has a structural weakness. It is detecting that the string does not look like a word, so the obvious counter is to make the string look like a word. Dictionary DGAs do exactly that. Suppobox and Matsnu carry built-in word lists and concatenate entries to build domains like cityjulydish.net, which MITRE actually uses as its example of the concatenated-word style. Matsnu, in one analysis, kept two dictionaries, one of verbs and one of nouns, and combined a word from each until the name passed a length threshold. The output is pronounceable, alternates vowels and consonants like real English, and produces n-gram statistics that sit right on top of legitimate domains. The entropy score barely moves. The consonant-vowel ratio looks fine. The cheap classifier shrugs.

This is the split that still defines DGA detection. Character-level statistics handle arithmetic and hash families with ease and fall over on dictionary families. Catching the dictionary ones means modeling word structure rather than character randomness: looking at whether the concatenation of real words is itself unnatural, whether the specific word pairs co-occur in real text, whether the domain’s semantics hang together. That is a harder problem, and it is why a chunk of the research literature is specifically about word-based DGA detection as its own track.

The neural turn

The character-feature approach has an obvious cost. Someone has to design the features, and every new family is a chance for the hand-built features to miss. In 2016 a team at Endgame published the result that pushed the field toward learned features: a character-level LSTM that read the raw domain string and classified it with no hand-engineered features at all. The network learns its own notion of which letter combinations matter, generalizing past the bigram features a human would have written by hand. On their evaluation it reported an ROC AUC of 0.9993 for the binary task and a 90 percent detection rate at a one-in-ten-thousand false-positive rate, which they put at roughly twenty times better on false positives than the prior best. The model ran on a single domain in isolation, no DNS context required, which makes it deployable inline.

The honest caveat in that work is the same split as before. The character LSTM was very effective everywhere except the wordlist DGAs, where a character model has little to grab onto, because the characters are fine; it is the word choice that is wrong. Subsequent work piled on. Convolutional networks, bidirectional LSTMs, attention layers, and transformer encoders have all been tried, often in hybrids that pair a character model for the random families with a word-aware model for the dictionary ones. The reported accuracy numbers in these papers are high, frequently above 99 percent on the standard corpora, with false-positive rates in the hundredths of a percent. That sounds finished. It is not, and the reason is worth stating plainly: a 0.01 percent false-positive rate sounds tiny until you multiply it by the billions of DNS queries a large network emits in a day, at which point the alert queue fills with legitimate domains and the analysts stop reading it. Detection accuracy on a balanced benchmark and detection usefulness on a live resolver are different quantities, and the gap between them is where most deployed systems actually live.

Two structural problems sit under all of this. The benchmarks lean heavily on DGArchive and a handful of public family lists, so a model can look excellent by overfitting to families it has already seen and still miss a genuinely new one. And the operators read the same papers. A family that knows it will be scored on entropy moves to a dictionary scheme. A family that knows the dictionary scheme is now modeled folds in a dynamic seed so the domains cannot be enumerated for training in the first place. The classifier is always trained on yesterday’s families.

Where pre-registration still wins

For all the detection research, the move that actually kills a DGA botnet is still the one the Torpig and Conficker teams used: out-register the operator and sinkhole the names. When the seed is predictable, this scales into something decisive. The clearest case is GameOver Zeus. Its DGA backstop (inherited from the Murofet/Licat lineage) generated up to roughly a thousand candidate domains per day as a fallback to its peer-to-peer C2. When the FBI-led Operation Tovar took the botnet down in June 2014, the operation reverse-engineered that algorithm and pre-registered the domains it would generate well in advance, six months out by some accounts, so that any bot falling back to the DGA channel found a sinkhole instead of its master. Pair that with simultaneous action against the P2P layer and the botnet had nowhere to land. The losses attributed to GameOver Zeus ran past a hundred million dollars, which is the scale of problem that gets a multinational takeover organized.

The pattern repeats across the families with predictable seeds. Reverse the algorithm, compute the namespace, register or block ahead of activation, sinkhole what the bots resolve, and use the sinkhole traffic to count and geolocate the infected population. The sinkhole becomes a census. That is how the industry knows Conficker is still on hundreds of thousands of machines; the C2 has been boxed in for over a decade, but the bots still dutifully resolve their daily domains and report straight into the working group’s sinkholes. Where this breaks is exactly the dynamic-seed case: if the defender cannot compute next week’s domains, they cannot register them first, and pre-registration degrades into reactive sinkholing of domains that are already live. That is the line operators keep trying to push their families across.

Closing

The DGA endures because the asymmetry never went away. One live domain keeps a botnet breathing; killing the channel means neutralizing an entire generated namespace across dozens of registries, and the registration math always favors the side that only needs one name. Everything since Conficker has been an argument over a single variable: whether the defender can compute the domains before the malware does. When the seed is the date, the defender wins, and Operation Tovar is what winning looks like. When the seed includes something only the operator knows at run time, pre-registration stops working and the fight moves to detection, where the accuracy numbers are excellent on paper and the false-positive math is unforgiving at DNS scale.

What is striking is how little the core trick has changed. The arithmetic Conficker used in 2009 and the LSTM trained to catch it in 2016 are both, underneath, statements about the same string of characters: one side generating it from a seed, the other trying to recognize it without the seed. Conficker’s daily domains still resolve into sinkholes that have been quietly counting infected machines for fifteen years, long after anyone stopped worrying about the worm. The algorithm outlived its own threat. That is the unsettling thing about a good DGA. The C2 dies, the operator is indicted, the botnet is dismantled, and the generated names keep getting computed on schedule, every day, by machines that have no one left to call.

For the DNS rotation tricks that complement domain generation, see fast-flux networks; for the encrypted-resolution shift that is changing what NXDOMAIN-based detection can even see, DNS-over-HTTPS and DNS-over-TLS; and for fingerprinting the C2 traffic once a domain does resolve, JA3 and JA4 in threat hunting.


Sources & further reading

Further reading