Detecting automation via timing: how event latency reveals a bot

A human who fills a login form leaves a mess in the time domain. The first keystroke lands somewhere between a third of a second and several seconds after the field gets focus, because the eye has to find the field and the hand has to follow. The gaps between keystrokes wander. A double letter goes faster than a reach across the keyboard. The mouse arrives at the submit button on a curve, slows down near the target, overshoots a little, corrects. None of this is decoration. It is the unavoidable signature of a nervous system driving a pointing device through a perception loop, and it is very hard to fake because faking it means simulating the loop, not just the output.

A bot, by contrast, tends to keep good time. Events arrive when the code says to fire them, not when a retina and a motor cortex agree to act. That is the opening a whole class of detectors walk through. This post is about timing as a detection signal: the latency between an input event and the action it triggers, the cadence of continuous events like mousemove against the browser’s own animation clock, the gap between a page finishing paint and the first interaction, and the more subtle tell that synthetic interaction has an unnaturally clean clock even when every other field has been spoofed correctly.

We will start with the cleanest binary signal the platform hands detectors for free, isTrusted, and why it is not the whole story. Then the timing distributions that behavioral systems actually model: keystroke dwell and flight, reaction latency after paint, the human floor set by visual perception. Then the part most evasion misses, the cadence of continuous events against requestAnimationFrame and the coalescing pipeline. Then the clock itself, how reduced timer resolution both helps and hurts a detector. We close on why timing is the signal that ages well, and why the clean clock is the last thing to fix.

isTrusted: the free signal, and its limits

Every DOM event carries a read-only boolean called isTrusted. The platform sets it to true when the event was generated by the user agent in response to real user action (and a few programmatic methods like HTMLElement.focus()), and false when the event was dispatched from script through dispatchEvent(). The MDN reference is blunt about one corner case worth memorizing: an event fired through HTMLElement.click() sets isTrusted to false, despite click() being a “real” method call. Synthetic events built with new Event() or new InputEvent() and pushed through target.dispatchEvent() are false by construction.

For a detector this is the cheapest signal in the building. A single listener on mousedown, keydown, or submit that reads event.isTrusted separates the laziest automation from everything else with one branch and no math. Tools that drive a page by injecting JavaScript and calling dispatchEvent light up here immediately. The browser-use project hit exactly this: a watchdog that filled inputs by constructing new InputEvent('input', {...}) and dispatching it left isTrusted: false on every synthetic event, which the maintainers flagged as a signal that “leaks automation.” The fix is not to spoof the flag (a page cannot, from its own JavaScript, mint a trusted event) but to stop dispatching synthetic events at all and drive input through a lower layer.

That lower layer is where isTrusted stops being decisive. When a Chrome DevTools Protocol client calls Input.dispatchMouseEvent or Input.dispatchKeyEvent, the event enters Chromium’s input pipeline as if it came from the OS. It is isTrusted: true. The same is true of OS-level injection (synthesizing input at the operating-system layer below the browser) and of the chrome.debugger API. So the modern automation stack, Playwright and Puppeteer and the CDP-native tools, produces trusted events. isTrusted filters out the script-kiddie tier and nothing above it. Which is precisely why a detector that stops at isTrusted is a detector that loses, and why the interesting work moved into the time domain. For the adjacent question of how detectors notice the CDP channel itself rather than its output, see the Chrome DevTools Protocol as a detection vector.

The other thing a synthetic event drops, even when isTrusted survives, is the incidental metadata a real input path fills in. Transmit Security’s writeup on input-method analysis points out that a genuine paste produces an InputEvent with inputType: "insertFromPaste", while a script-dispatched input event arrives with no inputType at all, and a password manager leaves its own fingerprints in the element’s dataset. These are not timing signals, but they sit next to the timing signals in the same event object, and a detector collecting one is collecting the other.

What a trusted event still cannot fake: the timestamp distribution

Once isTrusted is true, the detector’s attention moves to event.timeStamp and the intervals between events. event.timeStamp is a DOMHighResTimeStamp: milliseconds elapsed since the document’s time origin, the same clock performance.now() reads. Subtract consecutive timestamps and you get the inter-event intervals, and those intervals are where human and machine diverge no matter how trusted each individual event is.

Keystroke timing is the textbook case. Two intervals matter. Dwell time is how long a key is held, the gap from its keydown to its keyup. Flight time is the gap from one key’s release to the next key’s press. Real typing produces wide, structured distributions in both: dwell varies with finger and key, flight collapses for letters typed by alternating hands and stretches for awkward same-finger reaches, and the second of a doubled letter almost always comes faster than the first. Tschacher’s behavioral-analysis writeup makes the same observation from the detector’s chair, that the keydown-to-keyup interval “depends heavily on the writing skills of the human” and that repeated letters speed up. A naive bot that types by firing keydown/keyup pairs on a fixed delay produces a flat line where a human produces a cloud.

*Identical inter-event gaps are the signature a fixed-delay typing loop leaves in the timestamp stream. The detector does not need to see the keys, only the deltas.*

Transmit Security puts approximate numbers on it: a bare bot types at roughly 8ms between keystrokes, far below any human floor, and even a deliberately throttled “human-like” bot that targets, say, 300ms between keys gives itself away by being too regular, lacking the irregularity and the corrections a real typist produces. The first failure is being too fast. The second, subtler failure is being too even. Adding a fixed delay fixes the first and makes the second worse, because now the variance is artificially near zero. This is the recurring shape of the whole topic: the obvious fix moves the tell rather than removing it.

Reaction latency after a state change is the other timestamp signal, and it has a hard physical floor. A human cannot click a button that appeared 5 milliseconds ago, because seeing the button and moving to it takes time the body does not have. The perception literature puts simple visual reaction time around 250ms on average, with elite outliers reaching into the low 200s, and saccadic eye-movement latencies (just moving the eyes to a target, not acting on it) clustering around 200 to 240ms in overlap tasks and bottoming out near 140ms in the easiest gap conditions. Those are floors for the perception step alone, before the hand moves. So an interaction that fires within a few tens of milliseconds of the element becoming clickable is not a fast human. It is code that did not wait, because it had no perception loop to wait for. Detectors encode this as a minimum plausible latency between a paint or DOM mutation and the first input that targets the new element.

The flip side, noted in the same behavioral writeups, is that the first few seconds after page load are the detector’s blind spot. There simply is not enough interaction yet to build a distribution, so a bot that does its damage in the first zero-to-five seconds gives the classifier little to chew on. Which is part of why so much detection is paired with a challenge that forces the session to live longer, and why timing signals are usually one input to a score rather than a standalone verdict. The vendors who run this at scale, DataDome among them, describe the behavioral layer (mouse movement and jitter, scroll velocity, click timing, keystroke cadence, hover durations) as the layer that catches the bots that already passed the fingerprint and network checks.

The cadence the platform imposes: requestAnimationFrame alignment

Here is the part most synthetic-input code gets wrong, because it is not about any single event but about the rhythm of a stream of them, and that rhythm is dictated by the browser, not the application.

Since Chrome 60, continuous input events (mousemove, pointermove, touchmove, wheel, and mousewheel) are frame-aligned. The browser does not dispatch each one the instant the OS reports it. It holds them and flushes them to the page right before the requestAnimationFrame callback fires, once per frame. On a 60Hz display that is roughly one batch every 16.7ms. The motivation was performance: pointing devices sample faster than the screen refreshes (Chrome’s own writeup notes mice around 100Hz, touch panels at 60 to 120Hz, against a 60Hz monitor), so dispatching every raw sample meant redundant hit-testing and layout work the frame would never show. Aligning to rAF cut hit tests by about a third in their measurements. Discrete events, the keydown, mousedown, mouseup, touchstart and friends, are exempt; they dispatch immediately so ordering is preserved.

The detection consequence is precise and easy to check. On a real browser driven by a real hand, the timeStamp deltas between consecutive mousemove events cluster around the frame interval, near 16.7ms on a 60Hz panel, because that is the cadence the platform enforces on the dispatch side regardless of how fast the mouse is sampled underneath. A detector that records mousemove timestamps and histograms the gaps expects a spike at the frame period. Synthetic input that pushes mousemove events on a different schedule, faster, slower, or on a perfectly fixed delay that does not match any display refresh, produces the wrong histogram. A stream of mousemove events spaced exactly 10ms apart, or all sharing a single timestamp, did not come through the rAF-aligned pipeline that a genuine cursor traverses.

*Genuine cursor movement reaches page script in frame-aligned batches. Synthetic mousemove that ignores the frame clock prints the wrong inter-event histogram.*

The platform even hands the page a way to inspect the cadence directly. PointerEvent.getCoalescedEvents() returns the sequence of raw pointermove samples that the browser merged into the single frame-aligned event the listener received. Each coalesced sample carries its own timeStamp, so a detector can ask: how many raw samples sit inside this frame, and what is their sub-frame spacing? A real high-polling-rate mouse during a fast drag yields several coalesced samples per frame with realistic sub-frame timing. Synthetic input that fires one flat pointermove per dispatch yields an empty or trivial coalesced list, which is itself anomalous for a movement that covered real distance. The drawing-app use case the API was built for (recovering the full trajectory for smooth curves) doubles as a timing oracle for the detector.

Scroll is the same story in a different event. wheel is on the frame-aligned list, so a real scroll arrives as a sequence of wheel events paced to the animation clock, each with a deltaY that ramps up and tails off as the wrist accelerates and releases. Programmatic scrolling through window.scrollTo or Element.scrollIntoView moves the viewport without emitting any wheel event at all, so a session that jumps from the top of a long article to a form near the bottom with zero intervening wheel events, and zero mousemove along the way, has skipped the entire physical act of getting there. Detectors that watch for content becoming visible without the scroll telemetry that should precede it read that gap directly. A bot that does emit synthetic wheel events to paper over it then faces the cadence test again: are they frame-aligned, do the deltas ramp like a real flick, or are they uniform steps on a fixed timer.

This is the layer where stealth patches that focus on static properties miss entirely. Spoofing navigator.webdriver, fixing the user-agent token, patching the chrome object: none of that touches the dynamics of how events arrive over time. The two stealth philosophies, source-patching Chromium versus injecting at runtime, both have to solve the static surface, but neither automatically produces a frame-correct event stream unless the input is generated to respect the animation clock. Generating input that survives this layer is its own discipline, covered in synthesizing human-like input events and, for the spatial half of the problem, why a real mouse path is hard to fake.

The clock itself: reduced timer resolution cuts both ways

There is a twist that makes the timing game less straightforward than “measure the deltas,” and it comes from a security mitigation unrelated to bots.

After Spectre, browsers deliberately coarsened the high-resolution clock, because precise timers let speculative-execution attacks measure cache behavior and leak memory. Chrome clamped performance.now() and the other DOMHighResTimeStamp sources to 100 microseconds in ordinary contexts starting around version 91 (down from the previous 5-microsecond resolution). Firefox went coarser still, clamping to about 1ms by default, with privacy.reduceTimerPrecision rounding to a 2ms multiple and privacy.resistFingerprinting pushing it all the way to 100ms. And event.timeStamp reads the same clock and inherits the same clamping; MDN spells out that it is “accurate to 5 microseconds” in principle but reduced in practice to prevent fingerprinting. The interval a detector computes between two events is therefore quantized to whatever grid the running browser applies, which is the whole reason the clamping matters here at all.

Sites that want the fine-grained 5-microsecond clock back have to opt into cross-origin isolation by sending Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp, which proves the page cannot pull in unwilling cross-origin resources and therefore is not a useful Spectre vector. So the resolution available to a behavioral detector depends on whether the protected page bothered to become cross-origin isolated, and most do not, because the headers break a lot of third-party embeds.

For a detector this is a double-edged tool. The coarse clock blunts the very measurement that catches a too-even bot: if every event.timeStamp is already rounded to a 100-microsecond or 1ms grid, the artificial near-zero variance of a fixed-delay typing loop is partly masked by the grid, and small human jitter below the grid resolution disappears too. But the same coarseness hands the detector a different, cleaner signal. Real human intervals are large and noisy, hundreds of milliseconds with tens of milliseconds of spread, so the clamp grid barely touches them. Synthetic intervals that were computed to land on suspiciously round numbers (a 50ms delay, a 100ms delay) survive the clamp as suspiciously round numbers. And the clamp behavior is itself a fingerprint: which grid the timestamps fall on reveals whether the page is cross-origin isolated, which browser family is running, and whether anti-fingerprinting preferences are active, all of which a detector cross-checks against the user-agent the client claims. A client reporting Chrome while producing Firefox-style 2ms-grid timestamps has a contradiction to explain.

There is a research wrinkle worth knowing because it tells you the clamp is not a hard ceiling. The coarse clock can be partially defeated. A counting thread spinning on a SharedArrayBuffer, or interpolation that watches when the clamped performance.now() value flips, can recover sub-clamp resolution; one analysis recovered tens of microseconds on modern hardware despite the official 100-microsecond floor. That matters less for an attacker building a faster bot and more for a detector that wants finer behavioral measurement than the platform nominally offers, though SharedArrayBuffer itself needs the same cross-origin isolation, so the path is not free.

Where timing sits in a real detection stack

No serious vendor decides bot-or-human on a single inter-event histogram. Timing is one signal in a score, weighted alongside the network and fingerprint layers, and it earns its weight by being expensive to fake correctly rather than by being decisive alone. The behavioral layer is explicitly described, by the vendors who sell it, as the catch for sophisticated bots that already cleared the cheaper checks. A scraper can present a perfect TLS fingerprint, a clean residential IP, a plausible canvas hash, and still produce a mousemove stream that does not respect the frame clock or a keystroke cadence with no human variance. That is the residue timing detection is built to read.

A few proof-of-work systems weaponize timing from the other direction. Rather than measuring how human your input looks, they measure how long your environment takes to run a calibrated computation, and flag a token that comes back faster than a real browser executing the challenge could manage. Kasada’s client VM is described this way: the work is tuned to human-browser execution timing, and solving it too fast (the tell of a native reimplementation rather than a real browser) flags the result. The detail is in Kasada’s anti-instrumentation, but the principle is the same coin. Either the clock catches you moving too fast, or it catches you computing too fast.

*Timing is one weighted input, not a verdict. Its job is to be the signal a scraper has not fixed yet after fixing the cheaper ones.*

The clean clock is the last thing to fix

The reason timing detection ages so well is that it does not depend on a property the platform might rename or a header order a library might shuffle. It depends on a structural fact about how the two kinds of agents produce events. A human runs a perception-action loop with hard biological floors and irreducible noise; a program runs a scheduler. You can spoof a static field with a one-line patch and you can match a fingerprint by harvesting a real one, but matching the time-domain signature means reproducing the loop, generating input that respects the frame clock, carries human-shaped variance in dwell and flight, waits a plausible perception latency after each state change, and fills the coalesced-event buffer with realistic sub-frame samples. That is not a patch. That is a simulation, and a simulation that is wrong in any one of those dimensions leaves a residue in the deltas.

This is why the order of operations in evasion is so consistent, and so telling. The static surface gets fixed first because it is cheapest. The fingerprint gets matched next. The network layer gets a residential exit. The clock gets fixed last, if at all, because it is the only one that cannot be copied from a real session, it has to be manufactured, and manufacturing it correctly is most of the work of pretending to be a person. The bot that has fixed everything else and still keeps perfect time is the most common shape a sophisticated scraper takes, which is exactly the shape the timing layer was built to read. A detector that has run out of static tells still has the clock, and the clock is the hardest thing in the building to lie to.

Sources & further reading

MDN Web Docs, Event: isTrusted property — defines isTrusted, the dispatchEvent=false rule, and the HTMLElement.click() exception.
MDN Web Docs, Event: timeStamp property — event timestamps are DOMHighResTimeStamps subject to the same reduced-precision clamping as performance.now().
MDN Web Docs, PointerEvent: getCoalescedEvents() method — recovering the raw sub-frame pointer samples merged into one frame-aligned event.
Chrome for Developers (2017), Aligned input events — Chrome 60 frame-aligns continuous input events to fire right before the requestAnimationFrame callback.
Chrome for Developers (2021), Aligning timers with cross-origin isolation restrictions — 100µs default clamp versus 5µs under cross-origin isolation, with the COOP/COEP requirement.
Nikolai Tschacher / incolumitas.com (2021), Bot Detection with Behavioral Analysis — reaction-latency and keystroke-interval signals, and the early-session blind spot.
Nikolai Tschacher / incolumitas.com (2021), On High-Precision JavaScript Timers — measured timer resolutions per browser and the SharedArrayBuffer/interpolation recovery techniques.
Morel Madmon / Transmit Security (2023), Bot Detection Based on Input Method Analysis — isTrusted, the ~8ms bot-typing tell, inputType metadata, and the too-even “human-like” bot.
DataDome, Why client-side signals are a must-have for detecting sophisticated attacks — the behavioral layer (mouse jitter, click timing, keystroke cadence) as the catch for bots that pass fingerprint and network checks.
browser-use, Synthetic events leak automation with isTrusted (issue #3829) — a real automation tool tripping the isTrusted=false signal by dispatching new InputEvent().
Tobii, How to measure the speed of human visual perception — the perception-step latencies that set the human floor for post-paint reaction time.

Detecting automation via timing: how event latency reveals a bot

isTrusted: the free signal, and its limits

What a trusted event still cannot fake: the timestamp distribution

The cadence the platform imposes: requestAnimationFrame alignment

The clock itself: reduced timer resolution cuts both ways

Where timing sits in a real detection stack

The clean clock is the last thing to fix

Sources & further reading

Further reading

Kasada's anti-instrumentation: how it detects CDP, Playwright, and patched runtimes

Headless Chrome detection: every tell from navigator.webdriver to missing codecs

Detecting CDP in the wild: the Runtime.enable leak and the V8 patch war