measured study · 2026-07-02

The scaling law: how many days until a true twin?

The look-alike finder’s promise is a historical day that looks like today’s. So how big does the history have to be before everyday has a near-perfect twin? We measured it — on the finder’s own distance and its own 0–100 score — and the answer comes down to one idea: how many independent ways a trading day can differ.

Days differ in ~15 independent ways

Think of every knob a session can turn independently of the others: which way it gaps, how hard the open drives, whether the first pullback holds, what lunch does, whether the afternoon extends or reverses, where it closes in its range, how wide the bars run, where the wicks cluster… If days only varied in two or three ways, near-clones would be everywhere. Measured from how fast best-match quality improves as the pool grows, a full RTH session behaves like it has about 15 independent knobs — its effective dimension. A twin has to agree with today on all of them at once, which is why each doubling of data buys a smaller step: variety multiplies, matches only add.

707580859010k100k1M10Mdays in the pool (log scale)87 @ 2.28Mfull days (measured)first 2 hours (measured)

Solid = measured (200 seeded query days, nested pool subsets, the engine’s exact distance and score). Dashed = the fitted power law extended — a prediction, not a measurement.

What the curve says

What we do with this

Three levers, in order of what the curve says they’re worth: grow the pool to the full gated universe (the biggest single step available); align smarter, so a day that printed the same path on a different clock counts as the clone it visually is; and match at the scale where clones actually live — mornings and phases, each scored against its own calibration. And when a day genuinely has no twin, the finder says so. “Nothing in millions of days looks like this” is information, not failure.

Method: engine-native distances (z-normalized multi-channel banded DTW, the matcher’s weights) over nested seeded pool subsets; medians over 200 query days; effective dimension from the log-log slope of best-twin distance vs pool size. Reports: twin_quality_curve.json, twin_quality_curve_24.json. Extrapolations are labeled predictions and will be re-measured as the pool actually grows.