How phishing kits fingerprint and cloak from security scanners

A phishing page has two audiences and they must never meet. One is the victim, who clicked a link in an email and needs to see a convincing login form. The other is the scanner: Google Safe Browsing fetching the URL out of a flagged message, Microsoft’s detonation service navigating it in a sandbox VM, a PhishTank volunteer, an antivirus crawler, a brand-protection vendor sweeping for impersonations. If the scanner sees the credential form, the domain gets flagged within hours and the campaign is dead. So the kit’s job is to tell the two apart on the first request and hand each one a different page. The victim gets the form. The scanner gets a 404, a redirect to the real bank, or a blank page. Same URL, two realities, decided before a single byte of the phishing HTML is sent.

That decision is cloaking, and it is now standard equipment. A study of phishing kits found that the overwhelming majority shipped with cloaking logic baked in, with user-agent and IP checks the two most common gates. The interesting part is not that kits cloak. It is how cheaply they do it, how the checks have climbed the stack from a one-line user-agent string match up to full browser fingerprinting, and how an entire service economy has grown up to sell the IP blocklists and bot-detection rules that individual kit authors used to maintain by hand. This post is about that machinery.

The sections walk it from the outside in. First the cheap server-side gates: IP and ASN blocklists, the user-agent match, the referrer and one-time-link checks. Then the client-side layer, where JavaScript fingerprints the browser because the request headers stopped being enough. Then the kit ecosystem and the anti-bot-as-a-service market that commoditized all of it. Then the defenders, and the move to client-side detection that cloaking cannot answer. The through-line is an arms race over one question, asked on every request: is the thing on the other end a person who can be robbed, or a machine that will report you.

The cheapest gate: IP and ASN blocklists

The first thing a phishing kit checks is where the request came from, because it is free and it catches a lot. Security scanners run from known infrastructure. Google, Microsoft, and the major antivirus vendors crawl from datacenter IP ranges that have been catalogued for years. So the kit keeps a blocklist, and any request from an address on it gets the benign treatment.

The blocklist comes in two flavors that are usually combined. The first is a hand-maintained list of specific ranges and hostnames belonging to security companies. A widely circulated “anti-anti-phishing” .htaccess file, the kind bundled into kits for a decade, denies access from dozens of vendor domains and roughly forty IP ranges by name: antivirus firms, sandbox and URL-analysis platforms, and even the domains of law-enforcement organizations, blocked with plain Apache deny from directives and reverse-DNS matches inside a <Limit GET POST> block. The list reads like a directory of everyone who might look. The second flavor is broader and lazier: block everything that is not a residential ISP. Cloud and hosting ASNs (the homes of AWS, Google Cloud, DigitalOcean, and the rest) almost never contain real victims clicking email links, so the kit rejects datacenter and VPN exit ranges wholesale. The BlackForce kit, analyzed by Zscaler’s ThreatLabz, cross-references incoming traffic against blocklists keyed on hostname keywords, ISP signatures, and country, and later versions added a mobile-only policy that rejects every desktop user-agent outright on the theory that the targeted victims are on phones and the scanners often are not.

*The cloaking gate sits in front of the payload. The scanner and the victim request the same URL and receive different documents, so the scanner's verdict is formed against content the victim never sees.*

Maintaining IP lists by hand does not scale, so kits increasingly outsource the lookup. Rather than ship a static file, the kit calls a third-party reputation API at request time, passing the visitor’s address and asking whether it belongs to a hosting provider, a known proxy, or a flagged ASN. Public abuse feeds make this easy. AbuseIPDB aggregates reported addresses with a confidence score and refreshes through the day; Spamhaus and similar sources cover the rest. The same data that a sysadmin uses to block attackers, a phishing kit uses to block defenders, because the defender’s scanner and the sysadmin’s attacker live on the same datacenter ranges. The economics here mirror what legitimate anti-bot vendors do with residential proxy and ASN reputation, just pointed the other way.

The IP gate has one structural weakness, and the defenders know it. A scanner that fetches from a residential IP, through a real consumer ISP, looks exactly like a victim at the network layer. That is expensive for a crawler to arrange at scale, which is the whole reason the gate works, but it is not impossible, and the better anti-phishing crawlers now do exactly that.

The user-agent match, and why it stopped being enough

After the IP, the kit reads the User-Agent header. This is the oldest cloaking check there is, and in its crudest form it is a string match: if the user-agent contains Googlebot, bingbot, Slurp, or any of a long list of crawler tokens, serve the decoy. The same logic that a webmaster writes into .htaccess to keep bad bots out, a phisher writes to keep good bots out. The BlackForce kit parses incoming user-agents with the open-source ua-parser-js library and matches against regex patterns that explicitly name the Nmap Scripting Engine (“Security Checker”), Netcraft’s survey agents, and Microsoft’s bingbot family including msnbot and adidxbot. The match is granular: it can branch on device, operating system, and browser, not just on a single bot token.

The reference implementation of this idea is a library called CrawlerDetect, and its appearance inside phishing kits is telling. CrawlerDetect is a legitimate PHP project that identifies bots from the User-Agent and HTTP_FROM headers using regular expressions, recognizing thousands of crawler user-agents from a maintained fixture list and exposing a simple isCrawler() check. It was built so a normal website could distinguish search engines from people. The 16Shop kit, one of the most successful phishing-as-a-service platforms before its 2023 takedown, shipped CrawlerDetect as one of three anti-indexing layers. The kit did not invent the detection. It imported a community library, the same way a real web app would, and inverted the intent.

*Cloaking is layered because each cheap check has a cheap bypass. The header match falls to a header swap, the IP gate to a residential proxy, the one-time link to a replay. Only the client-side fingerprint forces the scanner to bring a full browser, which is the most expensive thing it can do.*

The trouble with a user-agent match is that the user-agent is a string the client controls, and any competent crawler sets it to mimic Chrome. So the user-agent gate alone catches only the laziest scanners, the ones that announce themselves. To catch the rest, kits pair the string match with the IP check (a request claiming to be Chrome but arriving from a Google datacenter range is obviously a Google crawler wearing a costume) and then, when even that is not enough, push the decision into JavaScript. The user-agent is still useful, just no longer load-bearing on its own.

Referrer checks and the one-time link

There is a third header in the cheap-gate tier: the Referer. Phishing traffic has a characteristic shape. A victim arrives by clicking a link in an email, an SMS, or a chat message, which means the referrer is often empty or comes from a webmail domain. A scanner arrives because some pipeline extracted the URL and fetched it directly or from a known analysis platform. Referer cloaking checks that field and treats the wrong referrer, or a referrer matching phishing-report and security domains, as a tell. The same anti-AV .htaccess mentioned earlier includes referer rules that deny requests carrying phish or spam or the domains of report aggregators.

Closely related is the one-time link, which is less a header check than a session check. Many kits embed a unique token in the URL sent to each victim, and they invalidate it after the first successful load. The logic is simple and it specifically defeats the way detection pipelines work. When a victim reports a phishing email or a mail filter flags it, the URL gets passed around: to the security team, to an automated sandbox, to a threat-intel feed, to other vendors. By the time the second or third party fetches it, the token is spent and the kit serves nothing. The victim, who clicks once, sees the form; everyone downstream sees a dead link. 16Shop tracked this with HTTP_REFERER checks and $_SESSION state to suppress repeat visitors, which is the same mechanism viewed from the server side. Some academic taxonomies call this repeat cloaking, and it is effective precisely because the human-to-scanner handoff is always a second visit.

This is the full cheap tier: IP and ASN, user-agent, referrer, and the one-time token. None of it requires the client to run code. All of it ships in the kit as PHP and .htaccess, runs before any phishing HTML is generated, and costs the operator nothing per request. For years it was enough. Then the scanners started bringing real browsers, and the kits had to follow them up the stack.

The client-side layer: fingerprinting the browser

When a scanner fetches a URL with a plain HTTP client, it never runs the page’s JavaScript, so server-side gates can sort it out from headers alone. When a scanner brings a real browser (a headless Chrome, a sandbox VM driving a full rendering engine) the headers can be made to look perfect, and the only place left to catch it is inside the page, in code the browser actually executes. So kits started doing client-side fingerprinting, borrowing the exact techniques that commercial anti-bot vendors use, only inverted: instead of blocking the bot, the kit hides the crime from it.

The signals are familiar to anyone who has read about headless Chrome detection or how anti-bot systems fingerprint the JavaScript runtime. A script on the phishing page checks navigator.webdriver, which automation frameworks set to true unless they have been patched. It probes for the absence of plugins, a too-clean navigator object, missing or inconsistent properties that a real Chrome on real hardware would never show. It can render to a canvas and compare the output, fingerprint WebGL, or measure timing in ways a headless environment gets subtly wrong. Eric Lawrence, who works on Edge’s defenses, describes the attacker side plainly: the kit looks for “hints of a virtual machine, loading from a particular IP range,” and plays innocent when it concludes it is being watched. The benign decoy loads only when the fingerprint says “automated,” and the credential form loads otherwise.

The strongest client-side gate is not a fingerprint at all. It is a demand for interaction. A growing share of kits put a CAPTCHA in front of the phishing form, frequently a real Cloudflare Turnstile or hCaptcha widget rather than a fake one. The point is not to verify the human in the anti-bot sense. The point is that an automated crawler arriving at the URL hits the challenge and stops, because rendering and solving it is exactly the work the crawler was built to avoid, while a motivated victim clicks through it without a second thought. This is the same borrowed-anti-bot-tech pattern covered in captcha-gated malware, and it is now common enough that Microsoft’s own threat reporting calls out CAPTCHA gates as a routine cloaking layer in phishing-as-a-service kits. The challenge wall converts the defender’s biggest strength, the ability to fetch and parse millions of URLs cheaply, into a liability, because the one thing it cannot do cheaply is sit through an interactive puzzle on every one of them.

Client-side cloaking also watches for the things a scanner does not do. A sandbox that loads a page, waits a few seconds, and screenshots it produces no mouse movement, no keystrokes, no scroll. Some kits hold the payload until they observe a real interaction event, on the reasoning that a crawler will time out and leave before a human would have finished reading the first line. Timing delays do the same job from the other direction: the kit defers loading the phishing content for long enough that an automated visit, which budgets a fixed and short time per URL, gives up before the form appears.

The kit ecosystem: 16Shop, BlackForce, and phishing-as-a-service

The cloaking did not get sophisticated because every phisher became an expert. It got sophisticated because a small number of kit authors built it once and sold it to everyone else. Phishing-as-a-service turned anti-analysis into a product feature, listed on the spec sheet next to the templates and the credential dashboard.

16Shop is the clearest case study, partly because Akamai’s researchers and later Trend Micro pulled it apart in detail, and partly because the story has an ending. The kit, attributed to an Indonesian actor first dubbed DevilScream, targeted PayPal, Apple, Amazon, and others, and sold as a packaged service with a license-validation system: each time index.php or /account/index.php loaded, a valid_file function in load.php phoned home to check that the deployment was authorized, which also let the author meter and monetize installs. The anti-analysis features were a selling point. The version Akamai analyzed could filter or block by user-agent at device, OS, and browser granularity, by IP range, by hostname, by proxy headers, by GET parameters, and against non-target users (it rejected non-Apple traffic on the Apple-themed pages). It used HTTP_REFERER and $_SESSION to drop repeat visitors. On top of CrawlerDetect, it integrated with a third-party service that would take a submitted user-agent and answer “bot or not,” outsourcing the classification entirely. Akamai also found, in a pirated copy of the kit, a hidden backdoor that exfiltrated the pirate’s stolen credentials back to the original author, which tells you everything about the trust model inside this economy. The platform compromised an estimated 70,000 users across 43 countries before INTERPOL coordinated arrests in Indonesia and Japan in the first week of August 2023, following the developer’s earlier 2022 apprehension.

BlackForce is the more recent shape of the same thing. Zscaler’s ThreatLabz analysis describes a kit that has pushed cloaking further client-side, using cache-busting filenames so each load pulls fresh anti-analysis JavaScript, sessionStorage to persist its decision across refreshes, and obfuscated client-side code in later versions to make the cloaking logic itself harder to read. Its server side keeps the familiar blocklists keyed on hostname keywords, ISP, and country, plus the desktop-rejecting mobile-only policy. The newer kits, Bluekit and FishXProxy among them, advertise their anti-bot and CAPTCHA modules as standard inclusions, and the adversary-in-the-middle kits that proxy a live login session to defeat multi-factor authentication carry the same cloaking front end, because a real-time proxy is even more sensitive to being scanned than a static credential page.

The thing to notice across all of them is that the cloaking is not bespoke. It is a stack of imported libraries, public IP feeds, and third-party API calls, assembled the way any web developer assembles a project. The author’s edge is not cryptographic cleverness. It is maintenance: keeping the blocklists current, keeping the user-agent regexes matching the latest scanners, keeping the fingerprint checks ahead of the crawlers. And maintenance is exactly the thing that can be sold as a subscription.

Anti-bot as a service: selling the cloak

The logical end point of that subscription model is to strip the cloaking out of the kit entirely and sell it on its own. By late 2024 that market existed in the open. SlashNext documented a cluster of dark-web services (Otus Anti-Bot, Remove Red, and Limitless Anti-Bot among them) that do nothing but the cloaking layer, sitting in front of whatever phishing page the customer already has.

The pitch is explicitly framed against Google’s “red page,” the full-screen Safe Browsing interstitial that Chrome shows when it has flagged a URL as deceptive. That warning is one of the most effective phishing deterrents in existence, because it reaches the victim at the moment of the click, and the entire value proposition of these services is to keep a phishing domain off the list that produces it. Otus advertises behavioral analysis, challenge-response, bot-signature detection, and threat-intel integration, with deployment in under two minutes. Limitless offers tiers: a standard mode that waves through low-risk bots and devices to avoid suspicion, and an advanced mode that blocks anything even slightly anomalous. Remove Red works the other end of the pipeline, claiming to get an already-flagged domain removed from the red-page list and then to hold a temporary whitelist so it does not immediately reappear. SlashNext called this the latest turn in the cat-and-mouse game and warned, correctly, that security teams can no longer treat Safe Browsing’s red page as a reliable first line.

*The same capability moves from a static file the operator maintains, to a library bundled in a kit, to a hosted subscription that any phishing page can point at. Each step lowers the skill needed to deploy good cloaking.*

This is the part that should worry defenders most, and it rhymes with the legitimate anti-bot industry’s own economics. Once cloaking is a hosted service with a maintained IP feed and a behavioral engine, the marginal phisher gets vendor-grade evasion without understanding any of it, the same way a marginal scraper rents a residential proxy network without understanding TLS fingerprints. The detection quality that used to separate a sophisticated 16Shop operator from a script-kiddie is now a checkout flow. And because the service sees traffic from thousands of phishing campaigns at once, it can build a shared, cross-customer view of which IPs are scanners (a collective signal network pointed at the defenders) in the same spirit as the collective networks the anti-bot vendors run against bots.

How the defenders crawl back

Cloaking is a bet about the request, and the defenders win by changing the request until the bet loses. The whole reason cloaking works is that scanners historically looked like scanners: datacenter IPs, default user-agents, no JavaScript execution, no interaction, a fixed and short time budget per URL. Every one of those is a tell, and the modern anti-phishing crawler exists to erase them.

The straightforward countermoves invert the cloak. Fetch from residential IP space so the ASN gate sees a consumer ISP instead of a cloud range. Send a real browser’s full header set, including a plausible referrer, so the user-agent and referer checks pass. Run an actual rendering engine, headless but patched against the obvious automation tells, so client-side JavaScript that probes navigator.webdriver and friends comes up empty. Solve or sit through the CAPTCHA when one appears. Introduce interaction and dwell so the timing and behavior gates do not trip. This is the same toolkit, point for point, that a scraper uses to get past commercial anti-bot, which is the quiet irony of the whole field: the anti-phishing crawler and the credential-stealing kit are running the identical arms race from opposite chairs, and the stealth techniques age out on both sides at the same rate.

The research frontier pushes on the cost of doing this at scale. PhishParrot, published in 2025 by Nakano, Koide, and Chiba, uses an LLM to drive adaptive crawling: rather than fetch every URL with one fixed crawler profile, it adjusts headers, interaction patterns, and timing per target based on what the site appears to be checking for, and reports being able to surface cloaked phishing content at a 96.52 percent rate where a static crawler would have collected the decoy or been blocked. The premise is that cloaking is a moving target, so the crawler has to move too, and an LLM is a cheap way to generate the per-site variation that a hand-tuned crawler cannot.

The deeper answer abandons the crawl entirely. If cloaking is a property of the server deciding what to send a remote scanner, then the one observer it cannot lie to is the victim’s own browser, because to steal the credentials the page must show the credential form to exactly that browser. Client-side detection runs the phishing check on the end-user’s device, at the moment the real page is rendering, against the actual content the victim is looking at. Eric Lawrence’s account of Edge’s approach makes the logic explicit: server-side detonation is expensive and cloakable, but client-side detection cannot be cloaked, because the attacker has to display the real attack to the person being attacked. In 2025 Edge shipped client-side detection for scareware that interrupts the attack in the browser and presents a block page the user can report from. SmartScreen and Safe Browsing still run their server-side pipelines, but the cloaking-proof layer is the one running where the lie has to be told.

What the cloak finally protects

Strip the layers away and the phishing kit’s anti-analysis is one decision, made before the page is built, about whether the visitor can be robbed or will report the crime. Everything above (the IP blocklists, the user-agent regexes, the referer checks, the one-time tokens, the headless fingerprints, the CAPTCHA walls, the hosted anti-bot subscriptions) is machinery in service of getting that one classification right more often than the defender gets it right. The cheap server-side gates handle the dumb scanners. The client-side fingerprinting handles the browser-driving ones. The service economy handles the maintenance so the marginal phisher does not have to.

The defenders’ counter is not better crawling, though they do that too. It is moving the verdict to the one vantage point the cloak cannot reach: the victim’s browser at render time, where the attacker is forced to show the real page or get nothing. That shift quietly resets the board. A kit can detect a datacenter IP, a default user-agent, a missing referrer, a navigator.webdriver flag, an unsolved challenge. It cannot detect, and cannot cloak from, the actual human it is trying to rob, because that human is the one audience it must let through. Every gate in the kit is built to keep the two audiences apart. Client-side detection wins by being the audience that is allowed in.

The economic detail worth sitting with: 16Shop, with its hand-tuned blocklists and imported CrawlerDetect and bot-or-not API calls, took years of maintenance and an INTERPOL operation across three countries to dismantle. The anti-bot services that replaced its cloaking layer sell the same evasion as a two-minute deployment, to anyone, with the IP feeds kept fresh by someone else. The capability did not get harder to build. It got easier to rent.

Sources & further reading

Akamai / Larry Cashdollar (2019), 16Shop: Commercial Phishing Kit Has A Hidden Backdoor — teardown of 16Shop v1.9.7 covering CrawlerDetect, the bot-or-not API, user-agent/IP/referer filtering, the valid_file license check, and the pirated-copy backdoor.
Trend Micro (2023), Revisiting 16shop Phishing Kit, Trend-Interpol Partnership — attribution of 16Shop to DevilScream/RNS and the anti-detection feature set, tied to the takedown.
INTERPOL (2023), Notorious phishing platform shut down, arrests in international police operation — the official account of the 16Shop seizure, the arrests in Indonesia and Japan, and the 70,000-victim, 43-country impact figure.
Zscaler ThreatLabz (2025), Technical Analysis of the BlackForce Phishing Kit — the client-side cloaking stack: ua-parser-js parsing, named security-tool regexes, hostname/ISP/country blocklists, cache-busting JS, and the mobile-only policy.
SlashNext / Daniel Kelley (2024), How Dark Web Anti-Bot Services Aid Phishing Campaigns — the Otus, Remove Red, and Limitless anti-bot services and how they sell red-page evasion as a standalone subscription.
Eric Lawrence (2024), Cloaking, Detonation, and Client-side Phishing Detection — why server-side detonation is cloakable and client-side detection is not, from a Microsoft Edge security perspective.
Nakano, Koide & Chiba (2025), PhishParrot: LLM-Driven Adaptive Crawling to Unveil Cloaked Phishing Sites — adaptive per-target crawling against cloaking, with the reported 96.52 percent surfacing rate.
ZeroFox (2024), Phishing Kits with Cloaked Techniques: The Next Generation of Phishing Attacks — the five-technique cloaking taxonomy and the share of attacks now using cloaking.
JayBizzle (project), CrawlerDetect — the open-source PHP library, built for legitimate use, that phishing kits import to identify and divert crawlers from the user-agent.
Spoofguard (2024), User Agent Cloaking in Phishing Websites: How Attackers Evade Detection — defender-side walkthrough of the user-agent inspection flow and the benign content served to crawlers.
Microsoft Edge team (2022), Website typo protection defends against fraud including phishing, malware, and other scams — context on SmartScreen and the client-side direction of browser anti-phishing.