Fingerprint-based fraud scoring: how device intelligence flags risky sessions
A returning customer logs in from a new city, on a browser they have used before, and checks out with a card that matches their billing address. A fraudster logs in to a stolen account from a residential proxy two timezones away, on a browser whose canvas hash has touched forty other accounts this week, and checks out with a card that does not match anything. Both requests look identical at the HTTP layer. Both pass the password check. The difference between approving one and declining the other is not in the request. It is in everything the platform already knows about the device behind it.
That knowledge has a name in the fraud-prevention industry: device intelligence. The mechanism is a device or browser fingerprint, joined to a pile of side signals (proxy reputation, velocity counters, the device’s own history across a network of merchants), fed into a model that emits a single number. Sift calls it a Sift Score. SEON calls it a fraud score. Fingerprint calls its building blocks Smart Signals and leaves the final number to you. The number answers one question: how risky is this session, right now, for this specific action? This post traces how that number gets built.
The road map: first, what a fingerprint actually is and how much identifying power it carries. Then the side signals that ride alongside it, proxy and VPN detection, velocity, and the device’s cross-merchant history. Then the scoring layer itself, walked through three vendors that publish enough documentation to reason about. Then the part most people get wrong, which is the difference between fraud scoring and bot detection. They share plumbing. They answer different questions.
What a fingerprint is, and how much it tells you
A device fingerprint is a hash. The inputs are dozens of attributes a browser or app exposes without asking permission: the canvas rendering of a known string, the WebGL renderer name, the audio stack’s oscillator output, installed fonts, screen geometry, timezone, language list, hardware concurrency, platform string. None of these identifies a device on its own. The screen resolution 1920×1080 is shared by a huge slice of the population. The trick is the joint distribution. The probability that two unrelated devices share all of the same attribute values is small, and it gets smaller with every attribute you add.
The foundational measurement of this is fifteen years old and still the cleanest way to think about it. In 2010 Peter Eckersley ran the EFF’s Panopticlick experiment, collecting fingerprints from 470,161 browsers. The distribution carried at least 18.1 bits of entropy. Among browsers that ran Flash or Java, the average carried 18.8 bits, and 94.2 percent of those were unique in the sample. Eighteen bits means that picking a browser at random, you expect roughly one in 286,777 others to match it. That paper used only the User-Agent string, plugin list, fonts, and screen resolution. Modern fingerprinting adds canvas, WebGL, audio, and a dozen more surfaces, and the entropy budget has grown accordingly. The mechanics of each surface, canvas in particular, are covered in canvas fingerprinting and WebGL fingerprinting; the point here is the aggregate.
*The illustrative bit values are per-attribute estimates, not measured constants; Panopticlick measured ~18.1 bits for the full joint distribution.*The bits-of-entropy number is the easy part. The hard part is that fingerprints drift. A browser update changes the User-Agent. A new monitor changes screen geometry. A font install shifts the font hash. If your identifier is a naive hash of all attributes, it breaks the moment any single input changes, and a fraud system that re-identifies the same device as a new one every week is worthless. This is the problem the commercial vendors actually sell a solution to. The open-source FingerprintJS library combines browser attributes into a hash and stops there; it cannot tell two identical devices apart, and it can assign the same visitor ID to different machines that happen to match. Fingerprint Pro, the paid product, layers server-side signals and a machine-learning matcher on top, so that a fingerprint with one drifted attribute still resolves to the same visitorId it had last month.
Fingerprint publishes its accuracy framing in some detail. The confidence score that ships with each identification runs 0 to 1 and is defined as confidenceScore = 1 - falsePositiveProbability. A brand-new browser, or one identified deterministically through a stored signal, scores 1.0. Confidence drops below 1.0 when identification falls back to probabilistic matching, because, in the docs’ own words, “non-deterministic properties change over time and their values could be the same between two identical browsers.” Private browsing, cleared storage, and third-party cookie restrictions all knock out the deterministic signals and push identification onto the probabilistic path. The company’s marketing puts identification accuracy at 99.5 percent and claims the identifier stays stable for months; the technical documentation is more careful, talking about true-positive rates and drift rather than a single headline figure. Their own comparison material claims Fingerprint Pro holds 98 percent accuracy at month four while a weaker competitor “may be missing nearly 90 percent of returning devices” by then. Treat the 99.5 as a vendor number. The mechanism underneath it, ML-assisted matching to fight drift, is the real story, and it is the same problem every device-intelligence vendor is solving. FingerprintJS internals digs into the open-source-versus-Pro signal split.
The signals that ride alongside the fingerprint
A fingerprint answers “is this the same device as before?” It does not answer “is this device suspicious?” For that you need signals that describe the device’s environment and behavior, not just its identity. These are the inputs that turn a stable identifier into a risk verdict.
The first family is network reputation. Where is the connection coming from, and is that origin trying to hide? Fingerprint’s Smart Signals expose this as a set of named fields. proxy is a boolean with proxy_confidence (high, medium, low) and proxy_details that distinguish residential from data_center proxy types. vpn is separate, carrying vpn_confidence and a vpn_methods array whose members spell out how the VPN was caught: timezone_mismatch when the device’s local timezone disagrees with the IP’s geolocation, public_vpn when the IP matches a known commercial VPN range, os_mismatch when the network-level signature disagrees with the claimed OS, and relay for anonymizing services like iCloud Private Relay. There is an ip_blocklist object with tor_node, attack_source, and email_spam booleans. Datacenter origin shows up in the IP geolocation block as an organization-type classification. None of these is conclusive on its own. A real customer on a corporate VPN trips public_vpn. The signal is an input, not a verdict. How vendors actually build these reputation lists, and why residential proxies are the hard case, is the subject of residential proxy and ASN detection.
The second family is velocity. A single fingerprint is interesting; a single fingerprint that touched 200 accounts in an hour is an attack. Fingerprint’s velocity signals count distinct entities per visitorId across three windows, 5 minutes, 1 hour, and 24 hours: distinct_ip, distinct_country, distinct_linked_id, and raw events. There are inverse counters too, like distinct_visitor_id_by_linked_id, which counts how many devices have touched a single account ID you supplied. A legitimate user generates one device per account and a handful of IPs. A credential-stuffing run generates one device per hundreds of accounts, or hundreds of devices per account if it is rotating fingerprints. The velocity counters catch both shapes. There is a documented ceiling: for visitors exceeding 10,000 events in 24 hours the platform stops returning the short windows and caps the 24-hour count at 10,000, which is itself a strong signal that something automated is happening. The relationship between velocity signals and the broader attack pattern is the whole subject of credential stuffing mechanics.
The third family is history, and it is the one that needs scale to work. The question is whether this exact device has been seen committing fraud before, anywhere. A vendor that only sees one merchant’s traffic can answer “has this device defrauded us.” A vendor with a network across thousands of merchants can answer “has this device defrauded anyone.” Sift markets this directly: device fingerprinting, in their words, reveals “if other members of the Sift network have seen this device before and how likely it is to be associated with fraud,” and a device flagged as fraudulent at one customer can be banned across all of them. SEON takes the opposite stance on data provenance and makes a point of it, advertising over 900 “first-party” risk signals across email, phone, IP, and device modules as of January 2025, where every lookup runs in real time against live sources rather than a shared consortium pool. Two philosophies. One bets on the network effect of pooled fraud history; the other bets on fresh first-party enrichment. Both feed the same kind of model.
Turning signals into a score: three vendors
The signals are inputs. The score is what a customer’s fraud team actually acts on. Here the three vendors diverge in instructive ways.
Sift is the clearest case of a learned score. The Sift Score is a number from 0 to 100, where higher means riskier, and there is a separate score per abuse type: payment fraud, account takeover, content abuse, promotion abuse. A user can score 9 for payment fraud and 88 for content abuse in the same session, because the models are distinct. The scores are produced in real time and move as more data arrives; sending Sift another event can push a score up or down. Under the hood, Sift’s engineering team has described an ensemble. A sequence model, specifically an LSTM built in Keras and TensorFlow with a masking layer for variable-length event sequences, processes the timestamped stream of a user’s actions directly, and its output is merged with logistic regression, decision forests, and Naive Bayes classifiers. That deep-learning path went to production in January 2018. The features fall into three buckets that map neatly onto everything above: identity features (email, device fingerprint), behavioral features (“posts written in the last hour”), and similarity features (shared IPs or billing addresses across accounts). The model is per-customer, trained on that customer’s labeled outcomes, but it draws on signals “linked to a fraudster elsewhere in our customer network.” Sift says the platform protects more than 34,000 websites, which is the asset that makes the network signal worth anything.
*Sift documents the 0–100 scale and the per-abuse-type score; the threshold positions are illustrative, since each customer sets their own accept/review/decline cutoffs.*SEON splits the job in two and is unusually explicit about it. There is a rule-based fraud score, computed by summing the values of every rule that fired on a transaction, and a separate machine-learning output SEON calls the AI Insights score, trained first on SEON’s global dataset and then adapted to the customer’s own labeled transactions. The rule score is the primary decision metric, the thing that drives approve, review, and block; the ML score is a second opinion that surfaces patterns the rules miss. SEON’s own documentation does not pin the rule score to a fixed 0-to-100 range or publish hard threshold numbers, describing scores as “high” or “moderate” and leaving the cutoffs to the customer. The design point is that a fraud analyst can read why a transaction scored the way it did, because the score is a sum of named rules, each tied to a signal. That auditability is the selling point of a rule-based core, and it is why SEON keeps one even as it adds ML on top.
Fingerprint sits at the other end. It does not ship a fraud score at all. It ships the identifier and the Smart Signals (bot, vpn, proxy, tampering, incognito, virtual_machine, the velocity counters, and a suspect_score that is “a weighted integer value based on the global probability” of the combined signals) and expects you to write the scoring logic. The Smart Signals are deliberately not returned to the browser; they are available only server-side, through the Server API, webhooks, or sealed results, so an attacker cannot read which signals tripped and adjust. That is the right call. A fingerprint vendor that told the client “we think you are using a residential proxy and an anti-detect browser” would be handing the evasion roadmap to exactly the people it is trying to catch. Fingerprint’s tampering signal is itself worth a look: it carries an anomaly_score from 0 to 1 (actionable above 0.5), a separate tampering_ml_score (actionable above 0.8), and an anti_detect_browser flag that names tools like AdsPower, DolphinAnty, OctoBrowser, and GoLogin. Those are the products built to defeat fingerprinting, and detecting their use is itself a high-value fraud signal. Anti-detect browsers compared covers what those tools do and why their countermeasures leak.
Three architectures, then. Sift learns the whole score end to end and leans on a 34,000-site network. SEON keeps a transparent rule sum as the decision metric and bolts ML alongside it. Fingerprint refuses to score and sells you the cleanest possible signals to score with. The common shape is identity plus environment plus history into a model, but the question of who writes the final decision logic, the vendor or the customer, splits them cleanly.
Why this is not bot detection
Here is where most write-ups blur, so it is worth being precise. Fraud scoring and bot detection use overlapping signals. They are not the same product, and confusing them leads to the wrong tool deployed against the wrong threat.
Bot detection answers a binary-ish question: is there a human at the keyboard, or is this automation? It cares about headless Chrome tells, CDP traces, the Runtime.enable leak, TLS and HTTP/2 fingerprints that do not match the claimed browser, mouse paths that violate Fitts’s law, sensor payloads that arrive too clean. The output drives a challenge or a block at the edge, often before the request reaches the application. DataDome, Akamai Bot Manager, Cloudflare, Kasada, HUMAN: these score traffic, and they do it on the first request, because their job is to keep automation off the origin entirely. The signal set is the network stack and the runtime environment.
Fraud scoring answers a different question: is this action fraudulent, regardless of whether a human performed it? A human manually testing stolen cards from a clean residential browser is a fraudster and not a bot. Every bot-detection signal says “human,” and the transaction is still fraud. Fraud scoring cares about the device’s history across merchants, the velocity of accounts touched, the mismatch between billing and shipping geography, the proxy that hides the true origin, the account that was created six minutes before its first high-value purchase. The output does not block at the edge. It feeds a decision deeper in the stack: approve the payment, hold it for manual review, decline it, step up to additional verification. The signal set is identity, money, and history.
*The two systems overlap on fingerprint and proxy signals but answer different questions, fire at different points, and emit different actions.*The overlap is real and it is the shared plumbing. Both systems fingerprint the device. Both detect proxies and VPNs. Both run velocity counters. Fingerprint even ships a bot signal (values not_detected, good, bad, with a bot_type that can read selenium) right next to its fraud-oriented signals, because the same SDK collects both. The difference is what sits downstream. A bot-detection vendor turns those signals into an edge verdict measured in milliseconds. A fraud-scoring vendor turns them into a risk score that a payments flow consults. You can run both, and large platforms do: bot management at the edge to shed automation, fraud scoring at checkout to catch the humans and the surviving bots that the edge let through. They are layers, not alternatives. The clearest tell that you are looking at a fraud system and not a bot system is the output vocabulary. If it says approve, review, or decline, it is scoring fraud. If it says allow, challenge, or block, it is detecting bots.
There is one more distinction that matters operationally. Bot detection is adversarial on a fast clock. The moment a vendor ships a new headless tell, the bot operators patch around it, and the stealth-patch lifecycle runs on a cycle of weeks. Fraud scoring is adversarial on a slower clock, because the fraudster cannot easily fake the thing that hurts them most, which is the device’s accumulated history. You can rotate a fingerprint. You cannot rotate the fact that a freshly minted fingerprint has no history, and a no-history device hitting a high-value action is itself a signal. The fraudster who buys an anti-detect browser to look unique succeeds in looking unique and thereby fails to look trusted, because trust is built from a track record the system has watched accumulate. Carders and account-takeover crews live with exactly this tension, and the cat-and-mouse around card testing is covered in carding and the bot economy.
Where the score breaks, and what that costs
Every layer described here has a failure mode, and they compound. The fingerprint drifts, and a returning good customer looks new. The proxy signal fires on a legitimate corporate VPN, and a real purchase gets held for review. The velocity counter trips on a family sharing one device, or on a popular library computer. The network history is poisoned when a real device was once linked to a fraudster through a shared IP. Each false positive has a cost that does not show up in the fraud-caught metric: the abandoned cart, the support ticket, the customer who never comes back. This is why none of these vendors ships a universal threshold. Sift tells customers to find the score “where the majority of the entities above it are fraud,” and that line sits in a different place for a marketplace than for a bank, because the cost of a wrong decline is different.
The deeper limit is that a fraud score is a probability, dressed up as a number that an automated flow can branch on. The 0-to-100 scale invites the fantasy of a clean cutoff: below 30 approve, above 70 decline, review the middle. Reality is a smeared distribution with good users in the high band and fraudsters in the low band, and the band you choose to review is where the system is admitting it does not know. A residential proxy plus a clean fingerprint plus a card that matches a stolen identity’s billing address can produce a low score for a fraudulent transaction, and the system will approve it with confidence, because every signal it can see looks ordinary. Device intelligence narrows that band. It does not close it.
What has actually changed over the past few years is less the scoring math and more the signal supply. The ML matchers that hold a fingerprint stable through drift, the anti-detect-browser detectors that name AdsPower and GoLogin by tool, the move to keep every fraud-relevant signal server-side and out of the client’s reach, the consortium histories spanning tens of thousands of merchants: these are what raised the floor. The score is still a probability. But the inputs feeding it are richer, fresher, and harder to spoof than they were when a fingerprint was just a hash of a User-Agent and a font list. The fraudster’s problem is no longer hiding a single request. It is manufacturing a history, and history is the one input that takes real time to forge.
Sources & further reading
- Eckersley, P. (2010), How Unique Is Your Web Browser? — the EFF Panopticlick study; 470,161 browsers, 18.1 bits of entropy, the origin of measured fingerprint uniqueness.
- Fingerprint (2025), Smart Signals reference — exact field names for bot, vpn, proxy, tampering, velocity, and the suspect_score across browser and mobile SDKs.
- Fingerprint (2025), Identification, accuracy, and confidence score — the 0-to-1 confidence score definition and why probabilistic matching lowers it.
- Fingerprint (2024), FingerprintJS and Fingerprint Pro: identification accuracy explained — open-source versus Pro, server-side signals, and the drift-over-time accuracy argument.
- Fingerprint (2025), Device fingerprinting: what it is and how it works — the 100-signal hashing model and browser-versus-device fingerprinting distinction.
- Sift (2018), Deep learning for fraud detection — the LSTM sequence model, the ensemble it joins, and the January 2018 production launch.
- Sift, Sift Score: predict risk and approve more good users — the 0-to-100 per-abuse-type score, real-time recomputation, and the 34,000-site network claim.
- Sift, What is a Sift Score? — the official definition of the score and the guidance on setting block/review/accept thresholds.
- SEON (2025), How to interpret SEON’s risk scores — the rule-based fraud score versus the ML AINSIGHTS score, and how they divide the decision.
- SEON (2025), SEON now delivers 900+ first-party risk signals — the January 2025 expansion to 900+ signals across email, phone, IP, and device, with the first-party data argument.
- SEON, Device intelligence — how the JavaScript and mobile SDKs collect device data and feed it to the Fraud API.
- Feedzai (2024), What is device intelligence? — a vendor-neutral framing of device intelligence as risk assessment beyond identity, and where bot detection fits within it.
Further reading
HUMAN's collective signal network: how cross-customer telemetry feeds detection
Traces how HUMAN Security aggregates signals across its customer base, from its White Ops ad-fraud heritage to the Satori threat-intel disruptions, and what the collective-defense model can and cannot see.
·19 min readBehavioral biometrics in fraud detection: mouse, keystroke, and touch dynamics
Traces what mouse, keystroke, and touch dynamics actually measure, how continuous authentication differs from a login check, how BioCatch and BehavioSec build the profile, and why behavioral data sits in a regulatory grey zone.
·23 min readThe cold-start problem in behavioral biometrics
Behavioral models need history to judge a user, so first-session and new-account verdicts are structurally weak. Traces how vendors bootstrap with population models, device signals, and progressive trust, and where each fallback breaks.
·18 min read