research narrative · 2026-07-03
Shape math: how a chart read becomes a number
Everything the look-alike engine does leans on one move: turn the thing a price-action trader sees on an intraday chart into a number, so ~167,000 historical sessions can be searched for precedent. This page is the story of that math — what ships today, what’s landing now, and what comes after — with every method tied back to a read you already make on the tape.
The bet: a chart read is geometry
When you read a 5-minute chart you’re doing geometry. The drive off the open, the first pullback that holds or doesn’t, the wedge into lunch, the afternoon that never comes back — each of those reads is a statement about the shape of a path. The site’s core bet is that this is measurable: a chart read is geometry, geometry can be measured, and measured shapes can be searched. Get the measurement right and the archive stops being a pile of old charts and becomes an index of precedent. (How far search alone can carry it is itself measured — the scaling law: days vary in ~15 independent ways, so every doubling of data buys a smaller step.)
The honest part comes first, because it frames everything else. The engine is a visual-similarity tool, not a predictor. The site’s own 5-fold cross-validation showed the morning does not predict the afternoon — and the engine’s spec says so in its second sentence instead of burying it. When a match card shows what the closest days did next, that’s a labeled picture of the past, shown beside an any-day baseline so you can see when the matched days are doing nothing a random day wouldn’t. Every method below inherits that framing: instruments for describing the tape, not oracles.
What ships today: twins in warped time live today
Two days almost never make the same move at the same minute. Yesterday’s twin printed its pullback at 10:50 where today’s came at 11:00; compared bar-for-bar they read as strangers, while your eye calls them the same day. Dynamic time warping (DTW) fixes exactly this: it lets the two paths stretch a little in time so the pullbacks line up, then scores whatever difference remains. The stretch is capped at a ±4-bar band — about 20 minutes of timing slack on 5-minute bars — so a pullback can slide, but a morning can’t masquerade as an afternoon.
- Whole candles, not a line. The per-bar cost blends close, high, low, and open at weights 1 / 0.35 / 0.35 / 0.35— wick extent and candle-body direction count. A bar that opened high and sold off is not the same bar as one that opened low and rallied through the same high–low envelope.
- Pure shape first, size restored separately. Each path is z-normalized — the day’s mean and spread are divided out — so a $735 name can twin a $190 one on outline alone. Then a magnitude penalty 0.10·|ln(range% today / range% match)|adds the size of the move back, so a sleepy 0.3% day can’t twin a violent 2% day just because the outlines rhyme.
- The open counts extra. A front weight w(i) = 1 + 0.9·e^(−i/5)makes the opening bars weigh nearly double, decaying over the first half hour — a candidate whose open went the opposite way is penalized where it matters most.
- Agreement at the turns. Bars where today printed a 2-bar swing pivot get a ×1.8 weight boost. The swing skeleton is what a price-action trader actually reads, so a match must agree at the structural turns, not just on average drift.
All of it collapses to one distance. On its scale, ~0.17 reads as a tight twin and ~0.24 as a generic shape hundreds of days share — and when today is the latter, the engine says “generic morning” out loud and caps its own grades. The pool it searches: 104 tickers, ~167k sessions — indexes, mega-caps, sector ETFs, FX majors, crypto — every day sliced to the same 78-bar regular-hours window so the comparison is apples to apples at all.
Shape stays the oracle; three opt-in lenses re-rank its shortlist with context a path can’t carry. Structure — do both days print the same named Brooks structures (wedge, head & shoulders, opening reversal, climax)?Levels — z-norm deliberately erases location, so this lens restores the memory you carry between sessions (yesterday’s high/low/close, the week’s open, unfilled gaps, all in ATR units from today’s open) and favors twins that started the day in the same neighborhood. Volume — did participation arrive the same way, an opening surge vs a dead tape? Each lens is a nudge on top of the pinned distance, not a new oracle — and each is checked against a frozen tail-agreement rig rather than trusted on vibes.
The map: what your eye sees → what the math measures
Everything on this page in one table. Each row is a read you already make with your eye, and the measurement that turns it into a number you can search, rank, or filter on.
| What your eye sees | What the math measures | Status |
|---|---|---|
| “These two days look the same” | Banded multi-channel DTW distance — the shipping engine | live today |
| “Twins that never really diverged” | Fréchet distance — the worst point on the path | landing now |
| “Same day, except one spike” | LCSS / EDR — matching that skips bad prints | landing now |
| “Same move, shifted in time” | SBD — best whole-path alignment by cross-correlation | landing now |
| “THE opening-reversal day” | Soft-DTW barycenter — align first, then average | live today |
| “The market keeps drawing this shape” | Matrix profile motifs | live today |
| “Tape like nothing I’ve seen” | Matrix profile discords | live today |
| “The day changed character right there” | FLUSS — finding the seam in the session | live today |
| “Trend day vs range day” | Hurst exponent / DFA | live today |
| “Orderly grind vs chop” | Permutation entropy | live today |
| “Stretches keep snapping back” | OU half-life — minutes for a stretch to decay | live today |
| “Price is being accepted at the level” | RQA laminarity — time spent stuck | live today |
| “P-day vs b-day volume profile” | Wasserstein distance between volume-at-price histograms | live today |
| “A day’s fingerprint on a card” | Path signature — the path as a fixed vector | roadmap |
| “Which timescale is doing the work” | Wavelets / EMD — energy by timescale | roadmap |
| “Grep the archive for a chart” | SAX symbols + learned shapelets | roadmap |
| “The regime just changed” | HMM day-states / BOCPD change-point probability | roadmap |
| “These days rhyme, don’t ask why” | Compression distance (NCD) | roadmap |
| “A double top that won’t die” | Persistent homology — long-lived loops | roadmap |
Status chips are edited by hand as work lands: live today is serving on the site, landing now is being built and benchmarked in parallel with this page, roadmap is not started.
Stricter twins, different questions landing now
What your eye sees. When you call two days “the same day,” you often mean something stricter than DTW checks: they never stopped agreeing — no ten-minute stretch where one panicked and the other didn’t. Other times you mean something looser: “same day except for that one news spike,” or “the same move, it just started an hour late.” Those are three different questions, and one distance can’t answer all of them.
What the math measures. Four metrics, four definitions of “same.” Fréchet distance is the dog-leash test: walk both paths start to finish and record the longest leash you ever need — the score is the worst point, so a twin that diverged once, anywhere, is caught. LCSS asks how much of the two days can be paired up within a tolerance and simply skips what can’t — one bad print doesn’t poison the match. EDR counts how many bars you’d have to patch to turn one day into the other, an edit distance for tape. SBD slides one whole path against the other (cross-correlation) and scores the best alignment — pure time-shift matching, no local warping at all.
How it applies here. These run as second opinions on the DTW shortlist. A match that scores well on DTW and Fréchet is a certified never-diverged twin — a stronger claim than DTW alone can make. LCSS and EDR rescue twins that one spike would have cost. And SBD catches “the same open, an hour late,” which the ±4-bar band deliberately refuses — a different question the engine currently can’t ask.
The canonical curve of a family live today
What your eye sees. You carry an idealized template of each day type in your head — the opening-reversal day, the trend-from-the-open bull. No single date is it; every real instance is a noisy copy with its own timing. The template is the thing you actually compare against, and until now it only existed as intuition.
What the math measures. Averaging 200 opening-reversal days point-by-point destroys the pattern: one reverses at 10:00, another at 10:20, and the Euclidean average cancels the out-of-phase moves — the turn that defines the family flattens into mush. A soft-DTW barycenter aligns first and averages second: it finds the one curve whose total warped distance to all 200 members is smallest, so the reversals get lined up before they’re averaged and the turn survives.
How it applies here. Take every session the archive files under a pattern family and render the canonical opening-reversal day as an actual curve — not a description, a path. Archetypes become objects today can be scored against (“today sits 0.19 from the canonical opening reversal”), and each family in the day-type gallery gets a definitive picture at the top. The first two are below — computed from every session the archive’s own Brooks detectors file under the family, event-aligned on the reversal extreme so the turn survives the averaging, beside the three most textbook real members so the average can be judged against its inputs. An archetype describes its family; it forecasts nothing.
The shapes the market keeps drawing live today
What your eye sees. Three related reads. The market re-drawing the same picture — the flag that keeps printing, the midday coil you’ve seen a hundred times. Tape that looks like nothing you’ve seen. And the bar where the day changed character — where the morning’s two-sided chop became the afternoon’s one-way trend.
What the math measures. The matrix profile computes, for every window of the tape, the distance to its nearest neighbor everywhere else — in one pass. Windows with unusually close neighbors are motifs: the shapes the market keeps drawing. Windows whose nearest neighbor is still far away are discords: the weirdest stretches of tape in the archive. FLUSS then reads where each window’s neighbors live in time: while a session stays one regime, neighbors sit nearby; count the neighbor-arcs crossing each bar and the deepest valley marks the bar where the day changed character.
How it applies here. Motif tables — “this 45-minute shape has printed 312 times across the pool, here are the dates.” Discord surfacing — when today has no precedent, a precedent tool should say that plainly instead of serving its least-bad match. And honest session-splitting: instead of assuming the day flips at noon, FLUSS finds the actual seam, which is where “the morning doesn’t predict the afternoon” stops being a fixed clock boundary and becomes a measured one. The first mined set is below — every two-hour window of every session self-joined, occurrence counts measured, weirdest tape included.
The character of a day, in single numbers live today
What your eye sees. Within the first hour you’re already categorizing: trend day or range day; clean grind or chop; whether a stretch away from VWAP fades or extends; whether price is being accepted at a level or just visiting it. It’s the Brooks vocabulary — always-in, breakout mode, level acceptance, trapped traders — and it’s all about the day’s character rather than any one pattern.
What the math measures. One number each. The Hurst exponent (via DFA): do moves tend to extend (above 0.5) or fold back (below)? — trend-day vs range-day as a single number. Permutation entropy: how orderly the sequence of ups and downs is — a grinding microchannel scores low, a chop-fest scores near the maximum. OU half-life: on a range day, fit the pull back toward the mean and read off how many minutes a stretch takes to decay halfway — “fade it, it comes back in ~20 minutes” as a fitted parameter. RQA laminarity: from recurrence analysis, the fraction of time the path spends stuck revisiting the same state — time spent sitting at a level instead of moving through it.
How it applies here. These become the day’s character sheet — labels on every session in the archive, so twin search can be filtered (“range days with fast mean-reversion only”) and a shape twin from a day with the opposite character can be flagged as the weaker precedent it is. Laminarity in particular puts a number on level acceptance: high laminarity pinned under yesterday’s high reads very differently — for who’s paid and who’s trapped — than a quick tag and rejection, even when the two days share an outline. The first join is live: the day-types page now carries these numbers per type over ~180k labeled sessions — reversion half-life separates trend days from ranges cleanly, and Hurst, honestly measured, barely separates them at all.
Volume profiles as earth to move live today
What your eye sees. The volume profile’s shape: a P-day (rally, then business done up top — short covering), a b-day (liquidation, then business done down low), a double-distribution day (two value areas and a thin seam where price only traveled). Where volume piled up is where traders agreed on value; the profile’s shape is a map of who did business where.
What the math measures. A volume profile is a histogram — volume at price. Comparing histograms bin-by-bin fails the way lockstep path comparison fails: two profiles with humps one bin apart read as totally different. The Wasserstein (earth-mover’s) distance instead asks: how much dirt must move, and how far, to turn one profile into the other? Two P-days with humps at slightly different prices are near; a P-day vs a b-day is far — the full cost of hauling the hump from the top of the range to the bottom.
How it applies here. A fourth opt-in lens — volume profile — beside structure, levels, and volume. The existing volume lens compares how volume arrived over time; the profile lens compares where it piled up in price. With it, twins agree not just on the path but on where the day’s business got done — and P-shape / b-shape / double-distribution become searchable labels of their own.
The roadmap roadmap
None of this is built. Each item earns a slot only if it survives the treatment the engine already gets — frozen benchmarks, honest nulls published when something doesn’t help.
- Path signatures. A path’s fingerprint as a short, fixed list of coordinates — warp-tolerant like DTW, but the output is a vector instead of a pairwise score. That turns archive search into nearest-neighbor lookup instead of 167k DTW runs, and gives any future learned model a principled input.
- Wavelets / EMD. Which timescale the action is on. Two days can share an hour-scale arc yet differ completely in 5-minute texture; a wavelet or empirical-mode decomposition splits the tape by timescale and scores each band separately.
- SAX + shapelets. SAX turns a path into a short string, so a chart becomes searchable text — grep the archive for a shape. Shapelets go the other way: algorithmically learned short shapes that best separate one day family from another — the data’s own candidate for “the tell.”
- HMM / BOCPD. Day types as latent states with transition probabilities, and Bayesian online change-point detection for a live “the regime just changed” probability that updates every bar — FLUSS’s job, but online and with calibrated uncertainty.
- Compression distance. If two days share structure, their concatenation compresses better than the sum of their parts — a universal, parameter-free similarity. Useful as a cheap cross-check that anything the tuned metrics call a twin still passes.
- Topological data analysis. Persistent homology tracks the features of a shape that survive across scales — a double top is a long-lived loop. Counting loops and their lifetimes fingerprints a day by its topology while ignoring everything else.
The through-line: every method here formalizes a read you already make. And the boundary never moves — numbers describe the tape, they don’t predict it. What the math buys is honest precedent: measured, benchmarked, labeled a picture of the past. The read is still yours.
Sources: every shipping constant on this page (channel weights, band width, front weight, swing boost, magnitude penalty, twin/generic scale, pool size) comes from the engine’s living spec, src/lib/analogs/LOOKALIKES.md, which changes in the same PR as the code it describes and is pinned by a frozen trust bench in CI. The cross-validation finding (the morning does not predict the afternoon) is the site’s own, and is why every “what happened next” view ships with an any-day baseline beside it.