Skip to content

Why a real mouse path is hard to fake: trajectory, jitter, and Fitts's law

· 19 min read
Copyright: MIT
The phrase mouse path in monospace with an orange cursor arc overshooting a target

Move your cursor to a button and click it. You just produced a few hundred milliseconds of motion data that almost no automation reproduces correctly. The path was not straight. It curved, slightly, in a direction your arm prefers. It went too far near the end and corrected back. The speed rose to a peak somewhere around the first third of the distance and then fell off, and the faster you tried to go, the more the cursor missed and the more you had to fix it. None of that was a decision. It is what a limb attached to a nervous system does when it aims at a target, and it has been measured, modelled, and formalized for seventy years.

That body of motor-control research is now load-bearing infrastructure for bot detection. A behavioral biometrics engine does not need to read your fingerprint or your cookie to suspect you are a script. It watches the cursor. A human hand obeys a small set of laws that are hard to violate and, as it turns out, equally hard to fake convincingly, because the obvious ways to fake them (a straight line, a Bezier curve, a jittered Bezier curve) each break a different one of those laws. This post is about which laws, why they hold, and what a synthetic path gets wrong.

The sections below build up the model in the order a detector reasons about it. First Fitts’s law, the relationship between how far and how precise a movement is and how long it takes. Then the shape of the motion underneath that timing: the bell-shaped velocity profile, the minimum-jerk trajectory, and the two-thirds power law that ties speed to curvature. Then the small stuff, the sub-pixel jitter and the overshoot-and-correct that a steady hand produces without trying. Then what detectors actually extract from a raw event stream, and finally why the standard ways of synthesizing a path (straight lines, Beziers, jittered Beziers) leave signatures, and what closing that gap actually requires.

Fitts’s law: the speed-accuracy contract

In June 1954, in the Journal of Experimental Psychology, Paul Fitts published a result that has survived every attempt to overturn it. The time it takes to move to a target depends on two things, the distance to the target and how wide it is, and it depends on them through a logarithm. Move twice as far, you do not take twice as long. Move to a target half as wide, you pay a fixed increment of time, not a proportional one. Fitts borrowed the form directly from Shannon’s information theory: distance is the signal, target width is the noise, and the difficulty of the movement is measured in bits.

His original index of difficulty was ID = log2(2D / W), with D the distance and W the target width. Movement time then follows a straight line in that quantity, MT = a + b * ID, where a and b are fit per person and per device. The variant most detection literature and HCI testing actually uses is Scott MacKenzie’s Shannon formulation, ID = log2(D / W + 1), which behaves better at small distances and never goes negative. Throughput, the rate of information the pointing channel carries, is TP = ID / MT in bits per second. For a normal person on a normal mouse that number lands somewhere around four to nine bits per second, and it is remarkably stable for a given individual.

Movement time vs index of difficulty ID = log2(D / W + 1) (bits) movement time MT = a + b·ID slope b = 1/throughput *Fitts's law is linear in the log-scaled difficulty, not in raw distance. A detector can fit a person's a and b from a handful of movements and then flag motion that does not lie on the line.*

Why does this matter to a detector? Because it is a constraint that real motion obeys and synthetic motion usually ignores. A script that moves the cursor at a fixed velocity, or interpolates over a fixed number of steps regardless of distance, produces movement times that do not track the index of difficulty. Cross a thousand pixels to a tiny target in the same time you cross two hundred pixels to a big one and you have announced that no arm was involved. Worse for the faker, the relationship is not just present on average; it is tight per individual. Two people have different a and b. A detector that has seen a few of your movements can fit your line and then notice when a session’s movements stop landing on it. The information-theoretic reading Fitts gave it back in 1954 is exactly why it is useful here: the pointing act has a channel capacity, and capacities are hard to fake because they emerge from the hardware, not from intention.

The speed-accuracy tradeoff has a corollary that synthetic paths almost never get right. When you aim fast at a small target, you miss more often, and the misses are not random. They cluster as overshoots that you then correct. That correction is a second, smaller movement, sometimes a third, each one its own little Fitts’s-law sub-movement aimed at an even smaller residual distance. The whole approach to a target is a sequence, not a single glide. We come back to this under overshoot, because it is one of the most reliable human tells in the entire stack.

The shape underneath the timing

Fitts’s law governs how long. It says nothing about the shape of the motion in between. For that you need the kinematics, and the kinematics are where the synthetic paths really fall apart.

Start with the velocity profile. A point-to-point reaching movement does not run at constant speed and it does not snap between speeds. It accelerates smoothly to a single peak and decelerates smoothly to a stop, and the speed-versus-time curve is a bell. Flash and Hogan formalized why in 1985, in the Journal of Neuroscience: of all the trajectories that start and end at the right places, the one the arm actually picks is the one that minimizes jerk, the third time-derivative of position, integrated over the movement. Minimize jerk and you get a straight-ish path and a symmetric bell-shaped speed profile, which is what experiments show. The nervous system optimizes for smoothness, and smoothness has a precise mathematical signature.

Speed over the course of one movement time speed human: minimum-jerk bell synthetic: flat top, hard edges *The minimum-jerk model predicts a single smooth peak. Constant-velocity interpolation, the default in naive automation, produces a rectangular profile with discontinuous acceleration at the ends, which is physically impossible for a mass on the end of an arm.*

The bell shape is not the only kinematic law. There is also the two-thirds power law, reported by Lacquaniti, Terzuolo, and Viviani in 1983 and replicated endlessly since. When a hand traces a curve, its angular speed and the local curvature are not independent; speed falls where the path bends sharply and rises where it straightens, following angular speed proportional to curvature^(2/3). You slow down in the corners. Everybody does, without thinking about it, and the exponent is close to constant across people. A synthetic path drawn as a geometric curve and then sampled at a uniform rate violates this immediately, because a uniform sampling of a Bezier curve runs at roughly constant arc-speed through corners and straights alike. Real hands cannot do that. The corner forces a slowdown that the geometry of a naive interpolation does not produce.

There is a deeper model behind all of this. Réjean Plamondon’s kinematic theory of rapid human movements treats a movement as the summed output of two opposed neuromuscular systems, agonist and antagonist, each contributing a lognormal impulse response. Add the lognormals up, the Sigma-Lognormal model, and you reconstruct the asymmetric bell-shaped velocity profile of real motion, including the slight skew that pure minimum-jerk misses. The theory was built to synthesize handwriting and signatures, and it works well enough that it is the basis of the most convincing synthetic-mouse research. The point for a detector is that human velocity profiles live in a narrow, parameterizable family, and anything outside that family, anything too symmetric or too flat or too jagged, is suspicious.

Jitter, tremor, and the impossibility of holding still

Even when you are not moving the cursor, your hand is moving it. Physiological tremor, the small involuntary oscillation every human limb carries, sits in roughly the 8 to 12 Hz band and never switches off. Rest the cursor on a button before you click and the coordinate stream still wobbles by a pixel or two. Micro-corrections, the sub-movements your visuomotor loop fires to keep the pointer where you want it, add their own low-frequency drift on top. The result is that a genuinely stationary human cursor is never truly stationary; it has a texture.

A synthesized path has the opposite problem at both ends of the speed range. At rest it is perfectly still, because nothing in the code is moving it. In motion it is perfectly smooth, because a parametric curve has no noise. Naive automation produces coordinates that are too clean, and “too clean” is a detectable state. The cursor that teleports from the first form field to the second without any intervening samples, the cursor that holds a pixel-exact position for two seconds, the path with not a single redundant point: each of those is a statement that no hand was involved.

The instinct, once you know this, is to add noise. Sprinkle random jitter onto the synthetic path and the dead-clean signature goes away. It does, but it gets replaced by a new one, because human jitter is not white noise. It is band-limited (the tremor lives in that narrow 8 to 12 Hz window), it is correlated across consecutive samples (your hand has mass and momentum, so position errors persist for a few samples rather than flipping sign every frame), and it scales with what the hand is doing (more during fine targeting, less during ballistic transport). Independent uniform noise added per sample has a flat power spectrum and zero autocorrelation, which is the spectral fingerprint of a random number generator, not a nervous system. A detector running a frequency analysis on the residuals sees a difference that is hard to argue with. The faker has to model the noise, not just add it, and modelling it correctly means reproducing the tremor band, the autocorrelation, and the speed-dependence, which is a much taller order than calling a random function.

What a detector actually pulls from the stream

The raw material a behavioral engine works with is humble. The browser hands it a sequence of mousemove, mousedown, and mouseup events, each carrying screen coordinates and a timestamp. From that, the engine reconstructs everything above. First differences of position over time give velocity. Second differences give acceleration. Third gives jerk. The geometry of the path gives curvature and the angles between successive segments. The gaps between events give the sampling cadence and the pauses. None of this is exotic; it is the standard feature set across the mouse-dynamics literature.

The 2023 survey of mouse-dynamics biometrics out of Clarkson University catalogues the feature families in use, drawn from work reaching back over a century: kinematic features (velocity components, acceleration, jerk), geometric features (curvature, angular velocity, the angle of each movement segment), and derived efficiency measures like straightness and deviation from the ideal path, plus the statistical moments (mean, variance, skewness, kurtosis) of each of those over a movement. The survey notes explicitly that Fitts’s law itself can be turned into a feature, with effective target width feeding the user model. On the authentication side, where the task is telling one human from another rather than human from bot, the error rates are strikingly low. The survey reports work hitting equal error rates in the fractions of a percent. If mouse motion carries enough signal to tell two people apart at that resolution, it carries more than enough to tell a person from a script.

From event stream to feature vector mousemove (x, y, t) d/dt → velocity d²/dt → acceleration d³/dt → jerk geometry → curvature gaps → pauses, cadence statistical moments → score Every derived feature is a numerical derivative or geometric measure of the same (x, y, t) triples. *The detector never sees velocity or jerk directly. It differentiates the coordinate stream. Which means a synthetic path is judged on its derivatives, and the derivatives of a clean parametric curve look nothing like the derivatives of a hand.*

The numbers vendors publish are coarser but they point the same way. Mouse-dynamics write-ups describe bots moving the cursor at roughly 1,500 pixels per second on average against around 430 for humans, and humans producing far more short displacements than long ones, with a large majority of movements under a few hundred pixels. Treat those specific figures as illustrative rather than universal; they come from particular datasets and bot populations, and they will not hold across every site. But the direction is consistent everywhere. Bots are faster, straighter, more uniform, and more willing to move long distances in one shot than a person ever is. The gap between bot and human in these distributions is exactly the gap the timing and kinematics above predict.

The collection mechanism matters for understanding what gets seen and when. The script doing the watching usually runs inside the page, sometimes inside an iframe or worker context, and the event timestamps it reads can themselves leak automation through their regularity, a thread covered in detecting automation via timing. If the events are injected through the DevTools Protocol rather than produced by the OS input stack, there are separate tells in how they arrive, which connects to the CDP addScriptToEvaluateOnNewDocument trap and the broader question of synthesizing human-like input events without those events carrying a synthetic signature. The mouse path is one signal among many, and it is rarely scored alone.

Why straight lines, Beziers, and jittered Beziers all get caught

Walk up the ladder of synthetic paths and watch a different law break at each rung.

The straight line is the floor. Move the cursor from A to B by linearly interpolating coordinates and you have produced a path with zero curvature, constant or near-constant velocity if the steps are evenly spaced, and acceleration that spikes to infinity at both ends and is zero in between. It violates Fitts’s law on timing, the bell-shaped profile on velocity, and the basic physics of a mass that cannot start or stop instantly. No human movement is straight at the pixel level anyway; the minimum-jerk path is close to straight but the actual hand wanders off it by a few pixels in a consistent, individual way. A straight line is the single easiest path to flag, and naive automation still produces it constantly.

The Bezier curve is the popular fix, and it fixes the wrong thing. A cubic Bezier gives you a smooth, curved path, which looks human in a screenshot. The problem is that it is human-looking only in space, not in time. The curve says where the cursor goes; it says nothing about when. Sample it at a uniform parameter step and the cursor runs at roughly constant arc-speed, which has no bell-shaped velocity profile and violates the two-thirds power law at every bend. The curvature of a hand-drawn arc is coupled to its speed; the curvature of a Bezier is whatever the control points say and the speed is whatever the sampler says, and the two are not linked the way a real arm links them. A detector that looks at the velocity profile, or at the speed-curvature relationship, separates a Bezier from a hand without much trouble. The path passed the eye test and failed the math.

The jittered Bezier is the next attempt: take the smooth curve, add random noise to roughen it up, and you defeat the “too clean” check. As covered above, you defeat it by introducing a new signature. Per-sample uniform noise is white and uncorrelated, and human jitter is neither. Run a spectral analysis and the added noise sits flat across all frequencies instead of concentrating in the tremor band, and the autocorrelation is zero instead of positive across a few samples. You have swapped one tell for another that a frequency-domain detector reads just as easily.

Where each synthetic path breaks down straight line Fitts timing · bell profile · curvature · physics Bezier curve bell profile · two-thirds power law jittered Bezier noise spectrum (white, not 8–12 Hz tremor) neuromotor (Sigma-Lognormal) closes most gaps; still a moving target *Each rung fixes the previous tell and introduces a new one. Only a path generated from a neuromotor model reproduces the timing, the velocity profile, and the curvature-speed coupling at once.*

The research that makes this concrete is BeCAPTCHA-Mouse, published in 2020 by a group at the Autonomous University of Madrid. They built a bot detector on mouse dynamics and, to stress-test it, generated synthetic trajectories two ways: a function-based method using heuristic functions, and a data-driven method using a generative adversarial network trained on real motion. Against the more realistic synthetic samples their classifier still hit around 93 percent accuracy from a single trajectory, and fusing the neuromotor features with prior state-of-the-art raised accuracy by more than a third in relative terms. The lesson buried in that result is the interesting one. Even when the attacker uses a sophisticated generator, a single movement carries enough signal to expose it most of the time, and the defense improves specifically when it incorporates the neuromotor model of how real motion is produced. The arms race here is between how well the attacker models the hand and how well the defender does, and the defender currently models it better.

That is also why the honest version of a convincing synthetic path is not a clever curve. It is a neuromotor simulation: generate the velocity profile from a Sigma-Lognormal model with parameters sampled from a real distribution, derive the path from the velocity rather than the other way around, inject tremor as band-limited correlated noise in the right frequency range, and add overshoot-and-correct sub-movements scaled to the index of difficulty of each target. Doing all of that is possible. It is also a great deal more work than most automation invests, and every piece of it has to be right at once, because the detector is differentiating the result and checking several laws in parallel. Get the path right and the velocity wrong and you fail on the profile. Get both right and the jitter wrong and you fail on the spectrum.

What the cursor gives away

The thing worth sitting with is how little intention any of this involves. You do not decide to slow down in the corners or to overshoot a small target or to tremor at ten hertz. Those behaviors fall out of having a physical limb driven by a noisy controller, and they were measured and named decades before anyone thought to use them against bots. Fitts published in 1954. Flash and Hogan in 1985. The two-thirds power law in 1983. None of that work was about security. It became security because the laws are stable, individual, and produced below the level of conscious control, which is exactly the property a biometric wants and exactly the property a forger cannot easily borrow.

The practical upshot for anyone building automation is that there is no shortcut shaped like a curve. A Bezier with noise on it is a forgery of the appearance of motion, and the detector does not look at the appearance; it looks at the derivatives and the spectrum. Closing the gap means simulating the production of motion, not its trace, and the moment you are doing that you are reimplementing a piece of motor neuroscience inside your bot. Some operators do exactly that, and the better behavioral engines respond by modelling the hand more precisely still, which is why BeCAPTCHA-Mouse leans on the same Sigma-Lognormal theory the most realistic attackers use. The two sides are reading the same papers.

What does not change is the asymmetry. A human produces a correct mouse path for free, as a side effect of being human, and pays nothing for it. A bot has to reconstruct seventy years of motor-control findings and get all of them right simultaneously, on every movement, while a detector needs only one of them to be wrong to raise a score. That asymmetry is the whole reason pointer motion became a biometric in the first place, and it is the reason a real mouse path stays hard to fake: the human is not trying, and the machine cannot stop trying.


Sources & further reading

Further reading