Ad fraud and click fraud: how invalid traffic is detected and monetized

A digital ad is bought and sold in the time it takes a page to load. Before the first image paints, a request describing the ad slot has gone out to an exchange, dozens of demand-side platforms have each computed a bid, an auction has cleared, and a winning ad has been stuffed into the slot. Billions of those auctions clear every day. The whole machine runs on one assumption: that a real person is on the other end of the slot, about to see the ad. Strip that assumption away and the machine still runs perfectly. It just pays out to nobody for nothing.

That gap is where ad fraud lives. A bot loads a page, the page requests an ad, the ad serves, the impression is counted, and money moves from an advertiser’s budget to whoever owns the page. No human ever looked at anything. The interesting question is not whether this happens. It is how the industry tells a bot impression from a human one when both arrive as identical-looking HTTP traffic, and how the most successful fraud operations made that distinction nearly impossible to draw at scale.

This post walks through the mechanics. It starts with the programmatic supply chain and why it pays out blindly, then digs into the two operations that defined the field, Methbot and 3ve, using the post-mortems their hunters published. From there it covers the taxonomy the measurement industry settled on, GIVT and SIVT, the detection heritage that runs through White Ops and what became HUMAN, and the supply-chain plumbing, ads.txt and sellers.json, that tries to make spoofing structurally harder. The through-line is that ad fraud is a detection problem dressed up as an accounting problem.

How the supply chain pays out blindly

Programmatic advertising moved the buying and selling of ad space off contracts and onto auctions. A publisher lists inventory through a supply-side platform (SSP). An advertiser bids on that inventory through a demand-side platform (DSP). Between them sits an exchange running a real-time auction, usually settled in well under 100 milliseconds while the page is still assembling in the browser. The bid request carries the context a buyer needs to decide what the impression is worth: the site domain, the ad slot dimensions, a device and user-agent description, geo, and some identifier for the user.

Every field in that request is asserted by the seller. The exchange does not independently verify that the domain is real, that the user exists, or that the impression will ever render in front of a human eye. It cannot, in the time available. The auction trusts the supply side to describe what it is selling. A fraudster who controls the supply side controls the description.

That is the whole opening. Fraud monetizes by manufacturing the two things buyers pay the most for: human audiences and premium publisher inventory. The Google and White Ops post-mortem on 3ve put it plainly, that the operation generated revenue by selling forgeries of exactly those two assets. Manufacture traffic that looks human, attach it to inventory that looks premium, and the auction pays real money for both forgeries.

*The bid request is a set of assertions made by the seller. Verification, when it happens, happens elsewhere.*

Because the auction itself cannot pause to check, detection is always out-of-band. A verification vendor sits pre-bid (scoring the request before a buyer commits) or post-bid (analyzing the impression after it served), or both. Either way the detector is racing the same clock as everyone else and working from the same thin set of signals: the IP, the headers, the declared domain, whatever telemetry a measurement tag can gather if one fires. The fraud problem and the bot-detection problem are the same problem. A request that looks like a browser, carries a plausible header set, and originates from a residential IP is exactly what every anti-bot vendor has spent a decade learning to dissect. The difference is only in what the attacker wants: not to scrape a page or grab inventory, but to be counted.

Click fraud, impression fraud, and the shapes in between

The vocabulary is loose in casual use, so it helps to pin it down. Impression fraud manufactures ad views: a slot loads, the impression is counted, no human saw it. Click fraud manufactures the click that follows, which matters because pay-per-click pricing (search ads, much of social) charges on the click rather than the view. The two blur together because a sophisticated bot does both, loading the page, viewing the ad, and clicking through to mimic the funnel an advertiser pays a premium for.

Around those two sit a family of variations. Ad stacking layers many ad slots on top of each other in a single pixel so one pageview fires a dozen billable impressions, only the top of which could ever be seen. Pixel stuffing crams an ad into a 1x1 iframe nobody will notice. Domain spoofing, the most lucrative of the lot, lies about which site the impression came from, so a bid request for a worthless made-for-advertising page claims to be inventory on a prestige news domain. Each of these is a different way to inflate the count or the price. The detection surface is shared, which is why the industry tends to talk about all of it under one umbrella: invalid traffic.

The economics explain the priorities. Video carries the highest CPMs in display advertising, so the biggest operations went after video inventory specifically. Connected TV now carries even higher CPMs, and the fraud followed. When the payout per thousand impressions is high enough, it is worth building real engineering to forge them.

2016: Methbot and the data-center bot farm

Methbot was the operation that made the industry take server-side ad fraud seriously. White Ops published its analysis in December 2016, and the numbers were the kind that get a press cycle. The operation generated on the order of 200 to 300 million fake video impressions a day. At an average CPM around 13 dollars for the premium video inventory it forged, that came to an estimated 3 to 5 million dollars a day flowing to the operators.

What made Methbot different from the click farms before it was that it did not bother with infected consumer machines. It ran from data centers. The operators leased roughly 800 to 1,200 dedicated servers in data centers in the United States and the Netherlands, and acquired a pool north of half a million IP addresses, on the order of 570,000, many of them fraudulently registered to look like they belonged to United States residential ISPs. That registration detail is the clever part. A data-center IP is a red flag to any verification vendor. An IP that resolves, in the regional registry, to what looks like a Verizon or AT&T consumer block does not. The operation spent real money to make its server traffic wear a residential coat.

On top of that infrastructure ran a browser the operators built themselves, rather than driving a real one. A purpose-built client gives you a forgery with no leaks: you decide exactly what it reports for screen size, what it does on a getBoundingClientRect call, which events it fires, and when. The Methbot client faked the signals verification systems look for. It moved a cursor along plausible paths. It generated clicks. It started and stopped video players to produce the engagement events a video ad measures. It forged social-network login state so the session looked like a logged-in human rather than an anonymous hit. Every one of those is a checkbox on a fraud-detection rubric, and the client ticked them deliberately.

*Both operations spent their ingenuity on the IP layer, because a server-rack IP is the easiest fraud signal to catch.*

The detection story for Methbot is instructive precisely because the operation tried so hard to be invisible. The thing that gave it away was scale and uniformity. Hundreds of thousands of IPs all behaving with machine regularity, all funneling toward the same set of counterfeit inventory, produced statistical signatures no individual request carried. Browser automation that is individually convincing still tends to be collectively too clean. The cursor paths were plausible but not idiosyncratic; the timing was human-shaped but not human-noisy. White Ops correlated across the population, found the operation, and handed it to law enforcement. The man at the center, Aleksandr Zhukov, who reportedly styled himself the king of fraud, was arrested in Bulgaria in November 2018, convicted in May 2021, and sentenced that November to ten years.

2017-2018: 3ve and the supply that could not be blacklisted

If Methbot was a single large machine, 3ve (pronounced “Eve”) was three machines that learned from it. Google and White Ops, with contributions from Proofpoint and others, published the post-mortem in November 2018 as part of the takedown they called Operation Eversion. The headline metrics: over a million IPs under control at peak, up to 700,000 active infections at any given time, more than 10,000 counterfeit domains, over 3 billion daily bid requests, and on the order of 60,000 accounts set up to sell the fraudulent inventory.

The structural insight was that 3ve ran as three distinct sub-operations, labeled 3ve.1, 3ve.2, and 3ve.3 in the report, each with its own architecture. This was deliberate. Like a software company running A/B tests, the operators kept the parts isolated so that if defenders cut off one limb, the others kept earning. It is the anti-fragile design a mature fraud operation reaches for once it has been burned by a takedown before.

3ve.1 was the most elaborate. It ran bots from data centers but proxied every ad request through someone else’s IP, so the traffic appeared to originate from homes and offices in desirable markets. Some of those proxy IPs came from machines infected with the Boaxxe (also called Miuref) malware. Others came from a more aggressive technique: BGP hijacking. The operators set up autonomous systems, found IP blocks that were not being actively used, often belonging to defunct or long-dormant networks, and began announcing those blocks from their own infrastructure into the internet’s global routing table. The report traces a core AS, which it anonymizes as ALPHA, coming online in 2013 to build a benign history, then beginning to announce hijacked space in early March 2017. A later AS, BRAVO, layered defunct autonomous systems behind defunct autonomous systems to look more like a legitimate network and to churn IPs faster when blocks got blacklisted.

That churn is the point. If your fraudulent IPs are detected, you do not lose your operation. You burn the block and announce a fresh one. The same bots keep running in the same data center behind a continuously refreshed front of stolen, residential-looking addresses. A blacklist, the most basic defense, becomes a treadmill. (The 3ve case sits at the intersection of two of our other write-ups: the routing mechanics are covered in BGP hijacks and route leaks, and the broader question of how vendors judge an IP’s trustworthiness is the subject of residential proxy and ASN detection.)

*Separating the disposable IP front from the stable earning infrastructure is what made 3ve resistant to the obvious defense.*

The other two sub-operations took different routes to the same end. 3ve.2 leaned on the Kovter malware, spread through malvertising, which ran a hidden browser instance on each infected consumer PC. Redirection servers told those hidden browsers which counterfeit pages to load, and the ad requests rode out on the victim’s genuine residential IP, which is about as clean a signal as a fraudster can hope for. 3ve.3 was data-center based. The C2 mechanics the report does disclose are a study in blending in: the 3ve.1 binary instantiated an Internet Explorer COM object and used its Navigate2 method to make its first call to a hard-coded, encrypted C2 address, so the network activity looked like a normal browser rather than a bespoke malware beacon. The handshake checked that the host’s Windows locale was English and that the IP geolocated to the United States before the server would reply with the go-ahead to start generating fraud. The forgery extended down to who the bot was allowed to pretend to be.

Eversion brought the bid-request traffic close to zero within roughly 18 hours of the coordinated takedown. The United States seized dozens of domains and servers, the BGP-announced space was reclaimed, and the sinkholing severed the botnets from their command infrastructure. Eight defendants were charged across the Methbot and 3ve indictments unsealed in November 2018, and the cases tied the two operations to overlapping operators. The technical lesson the defenders drew was the one worth keeping: you do not kill an operation like this by blacklisting it, because its entire architecture assumes blacklists. You kill it by mapping it quietly, then cutting every limb in the same hour.

The taxonomy: GIVT and SIVT

The measurement industry needed a shared vocabulary for all this, and the one it settled on comes from the Media Rating Council (MRC), the body that accredits measurement in United States advertising. The MRC’s Invalid Traffic Detection and Filtration standards split invalid traffic into two tiers, and the split is about detection difficulty rather than intent.

General invalid traffic (GIVT) is the kind you catch with routine filtration: lists and standardized parameter checks. Known data-center IP ranges, declared search-engine crawlers, the user agents on the IAB/ABC International Spiders and Bots List (a monthly-updated roster of known non-human user agents, maintained jointly by the IAB and ABC and in use since 2006), pre-fetch and pre-render traffic, activity from obvious monitoring tools. None of it necessarily set out to defraud anyone. A legitimate crawler that loads a page with an ad slot generates an impression that should not be billed, and GIVT filtering exists to strip those out before the count is taken. It is the easy 80 percent, removable with a lookup.

Sophisticated invalid traffic (SIVT) is everything that gets past the lists. The MRC defines it as the situations that require advanced analytics, multi-point corroboration, and significant human intervention to identify. Hijacked devices, bots that mimic human behavior, click farms, session hijacking, and the domain laundering that 3ve ran are all SIVT. The defining quality is that no single request betrays it. A SIVT impression can come from a real residential IP, carry a real browser’s headers, and report human-looking interaction, because as often as not it is riding on a hijacked real device. Catching it means correlating across many requests, building behavioral and population-level models, and accepting that the line between valid and invalid is statistical rather than a binary lookup.

*The categories matter because accreditation, contracts, and refunds are written against them.*

The taxonomy is not academic. MRC accreditation is what lets a verification vendor’s numbers be trusted in the contracts that move advertising money, and accreditation is granted separately for GIVT and SIVT detection. White Ops became the first company accredited specifically for SIVT detection, in June 2016, which is the formal marker of the moment the industry agreed that catching the sophisticated tier was a distinct and harder discipline than running the lists.

The detection heritage: White Ops to HUMAN

The company that did the most to define SIVT detection is the same one that hunted Methbot and 3ve. White Ops, founded in 2012, built its reputation on exactly the population-level analysis those operations required. It earned the first MRC accreditation for SIVT detection in 2016, then later became the first accredited for end-to-end pre-bid and post-bid SIVT coverage across desktop, mobile web, mobile in-app, and connected TV. White Ops rebranded as HUMAN Security in 2021, consolidating ad-fraud verification and general bot mitigation under one name.

That consolidation tells you something about the field. The signals that catch a bot stuffing a checkout cart are the same signals that catch a bot stuffing an ad impression. HUMAN’s pitch is a network effect it calls collective protection: by verifying a very large volume of transactions across advertising, account security, and application traffic, it sees the same fraudulent infrastructure show up in multiple contexts and can flag it faster than any single-context vendor. The IP that hammered a login form last week is the IP loading counterfeit ad inventory this week, and a verifier sitting across both sees the link. We cover the mechanics of that cross-context model in HUMAN’s collective signal network and the company’s lineage in the PerimeterX to HUMAN rebrand.

The detection methods themselves are the anti-bot toolkit applied to a billing problem. On the device side, a measurement tag that fires in the page can fingerprint the JavaScript runtime, looking for the tells of a headless or instrumented browser, the same surface dissected in headless Chrome detection and JavaScript runtime fingerprinting. On the network side, the IP’s reputation and ASN, whether it sits in known data-center space, whether it has the markings of a residential proxy, feed a risk score. On the behavioral side, interaction telemetry separates human-shaped input from synthesized events; a forged cursor path is hard to make convincingly noisy, a problem we get into in why a real mouse path is hard to fake. And across all of it sits the population view, where an individually clean request becomes suspicious because ten thousand others look exactly like it.

The four MRC-accredited IVT vendors that dominate the space, HUMAN among them, all run some version of this stack. For connected TV and server-side environments, much of the detection has moved server-side, because a JavaScript tag cannot fire where there is no browser. That shift matters more every year, for reasons the next section gets into.

ads.txt, sellers.json, and making spoofing structurally hard

Detection scores traffic after the fact. The supply-chain transparency standards take a different angle: make it structurally harder to sell a forgery in the first place. The most important of these is ads.txt.

Domain spoofing works because the bid request asserts its own domain and nobody checks. ads.txt, short for Authorized Digital Sellers and published by the IAB Tech Lab in 2017, inverts the trust. A publisher places a plain text file at a fixed path on its domain, at /ads.txt, listing exactly which advertising systems are authorized to sell its inventory and under what account IDs. Each record is a few comma-separated fields: the seller’s domain, the publisher’s account ID at that seller, a relationship type of either DIRECT or RESELLER, and an optional certification-authority ID. A record looks like greenadexchange.com, 12345, DIRECT, d75815a79. The DIRECT versus RESELLER flag tells buyers whether the publisher controls that selling account itself or has authorized a reseller, and the final token, when present, is a Trustworthy Accountability Group (TAG) identifier tying the entry to a known entity.

The mechanism is a buy-side check, not a publisher-side block. A DSP receiving a bid that claims to be inventory on prestige-news.example fetches that domain’s /ads.txt, confirms the selling exchange and account ID appear on the authorized list, and rejects the bid if they do not. A spoofer who controls a worthless domain cannot make the real publisher’s ads.txt vouch for them. The file lives on the publisher’s own server, so only the publisher can edit it. That single constraint takes the most lucrative form of spoofing, claiming to be a premium domain you do not control, and turns it from invisible into checkable.

*The file lives on the publisher's domain, so authorization can only come from the publisher.*

ads.txt covers the web. app-ads.txt extended the same idea to mobile apps and connected TV, where the publisher’s domain is declared in the app store listing and the file lives there. sellers.json and the OpenRTB SupplyChain object close the loop from the other direction. ads.txt lets a buyer confirm that a seller is authorized; sellers.json, published by each exchange, lets a buyer discover the actual identity of the entity selling a given bid, and the SupplyChain object records every intermediary the bid passed through. Together they make a bid request’s path auditable end to end, so an impression laundered through a chain of resellers leaves a trail rather than vanishing into an anonymous hop.

The spec has accreted fields over the years. It reached version 1.0 in 2017, added subdomain and contact handling soon after, distinguished app-ads.txt in 2019, and by version 1.1 in 2022 added owner-domain and manager-domain records to clarify who actually owns inventory in pooled or managed-selling arrangements. The direction of travel is consistent: pin down ownership and authorization with enough precision that a forged claim has to contradict a published fact.

None of this is a cure. ads.txt stops a spoofer from impersonating a domain it does not control, but it does nothing about a fraudster who buys a cheap real domain, makes a real ads.txt, and pumps bot traffic through legitimately authorized channels. That is laundering through the front door, and it is squarely a SIVT-detection problem rather than a transparency-standard problem. The standards and the detectors are complementary. Transparency narrows the attack surface to traffic that is structurally valid but behaviorally fake; detection works on what is left.

Where the fraud went next

The pattern across a decade is that fraud chases the highest CPMs and the thinnest detection coverage, and right now both point at connected TV. CTV inventory is premium, the supply paths are long and complicated, and a great deal of it runs through server-side ad insertion (SSAI), which stitches ads into a video stream on a server rather than in a client app. SSAI is good for viewers and convenient for publishers, and it is a gift to fraudsters, because the ad request originates server-side where there is no browser to fingerprint and the device and app identifiers are asserted by whatever sits upstream. Pixalate’s measurement of the channel found that programmatic CTV traffic running through SSAI carried substantially higher invalid-traffic rates than non-SSAI traffic in 2024. Spoofing a device model, recycling impressions, and faking app identifiers all get easier when the detector cannot run code on the endpoint.

The money keeps the problem alive. Juniper Research has put global advertiser losses to ad fraud in the range of 80 to over 100 billion dollars a year in the mid-2020s, depending on the methodology and what is counted, a figure large enough that building serious engineering to commit it remains rational for the people who do. The arms race has the same shape it had in 2016. Fraud finds an environment where detection is weak, the industry builds detection for that environment, fraud moves to the next weak environment. CTV is the current frontier because that is where the detection tooling is youngest.

What the post-mortems actually taught

The Methbot and 3ve write-ups are worth reading in full, because they are among the few cases where the defenders disclosed real mechanism rather than marketing. Two things stand out. The first is that the entire contest is fought over a single question, whether a request came from a human, and that question cannot be answered from one request. Every durable defense in the space, from MRC’s SIVT category to HUMAN’s collective network to the population analysis that caught both operations, is an admission that you have to look across many requests to see what any one of them hides. Fraud that is individually perfect is still collectively detectable, and that is the only reliable handhold defenders have.

The second is that the standards and the detectors are doing different jobs and need each other. ads.txt did not catch 3ve; it made the specific trick of impersonating a domain you do not own structurally harder, which pushes fraudsters toward laundering through legitimately authorized supply, which is exactly the behavioral problem the detectors exist to solve. Neither layer is sufficient alone. The supply chain transparency narrows the funnel; the detection models work the narrowed funnel; the takedowns, when they come, require quietly mapping the whole thing first and then cutting every limb at once, because anything less just teaches the operation which limb to regrow.

The number that lingers is the 18 hours it took Operation Eversion to drop 3ve’s bid traffic to near zero after more than a year of patient, silent mapping. That ratio, a year of watching for a day of cutting, is the real shape of fighting sophisticated ad fraud. The detection is continuous and statistical and never finished. The kill, when it is possible at all, is fast, total, and rare.

Sources & further reading

Google and White Ops (2018), The Hunt for 3ve — the primary post-mortem on 3ve, with the sub-operation breakdown, BGP-hijacking mechanics, and C2 handshake details.
Krebs on Security (2016), Report: $3-5M in Ad Fraud Daily from ‘Methbot’ — early coverage of the White Ops Methbot report with infrastructure and revenue figures.
IAB Tech Lab (2017-2022), ads.txt — Authorized Digital Sellers — the ads.txt and app-ads.txt specification, record format, and version history.
IAB (2019), sellers.json — the seller-identity disclosure standard that complements ads.txt and the SupplyChain object.
Media Rating Council (2020), Invalid Traffic Detection and Filtration Standards Addendum — the formal GIVT and SIVT definitions and the detection-and-filtration requirements behind accreditation.
IAB / ABC (2019), International IAB/ABC Spiders & Bots List, Best Practices — the known-bot user-agent list that underpins GIVT filtration.
HUMAN Security (2016), White Ops Is First to Receive MRC Accreditation for Sophisticated Invalid Traffic Detection — the accreditation milestone marking SIVT detection as a distinct discipline.
CyberScoop (2021), Aleksandr Zhukov, self-described ‘king of fraud,’ is sentenced to 10 years — the conviction and sentencing outcome of the Methbot/3ve prosecution.
The Shadowserver Foundation (2018), 3ve Takedown / Operation Eversion — a takedown participant’s account of the coordinated sinkholing operation.
Pixalate (2024), Global Q1 2024 SSAI Benchmark Report for Connected TV — measurement of elevated invalid-traffic rates in SSAI-delivered CTV inventory.
Juniper Research (2023), Quantifying the Cost of Ad Fraud, 2023-2028 — the multi-year forecast for global advertiser losses to ad fraud.