A plain-language tour of what's happening when you press Record.
Pick a sound. Hear it. Make the same sound with your voice in three seconds. Get a score from 0 to 10. Three rounds make a daily puzzle.
Every round score is the average of two things, each rated 0 to 10:
They're equal-weighted. Together they catch the two things humans actually judge when they say "you nailed it" or "not quite."
If you tap "Show details" under a score, you'll see both halves separately. A high rhythm with a low timbre means you got the timing right but the colour of your voice is different — that's normal and counts as a real attempt.
Timbre is what makes a violin and a flute playing the same note sound different. It's the spectrum of frequencies present, the way they evolve over time, the harshness or smoothness of the texture.
We use a pretrained neural network — Google's YAMNet, trained on AudioSet — to convert each clip (the reference and your recording) into a 1024-dimensional vector that captures its acoustic fingerprint. We compare the two vectors using cosine similarity. Identical clips give 1.0; unrelated clips give close to 0. We map that to 0–10.
The model runs entirely in your browser. The weights (about 16 MB) are downloaded once on your first visit and cached. Your audio never leaves your device.
Rhythm is everything timbre throws away: when does the sound start? When does it fade? Are there gaps? Is it a sharp tap or a long swell?
We compute the loudness envelope (RMS over short windows) of both clips, then compare them with dynamic time warping — a way of measuring how similar two patterns are even if they don't line up perfectly in time. Identical envelopes give 1.0; mismatched ones give close to 0.
The rhythm dimension is what separates a deliberate "wooop" from a flat hum, or a quick "tk" from a sustained drone — even when the timbre dimension can't really tell them apart.
The bell reference is a synthesised pure tone at 880 Hz with a sharp attack and a long decay. No human voice can reproduce that timbre. Pure sine waves have no overtones; human voices have lots. The timbre dimension caps out around 6 or 7 even on a perfect attempt.
That's why the daily puzzle picks one easy, one medium, one hard sound: the hard one is honest about the ceiling. The friendlier sounds — Hum, Boing, Whoop — are vocally reachable and can score in the high 8s or 9s with a good attempt.
Every day, three sounds are picked deterministically from the date — same date, same set, everywhere in the world. There's no server picking; the date is the seed. The puzzle resets at your local midnight (matches Wordle / NYT Games).
Your progress for the day is stored on your device — refreshing or coming back later resumes you at the right round. Different days have separate slots, so a friend's link to yesterday's puzzle won't clobber today's progress.
When you tap Share, the URL contains a 14-digit code:
round(score × 10) capped at 99 (6 digits total)
So ?s=20260507859879 means "2026-05-07, scores 8.5 / 9.8 / 7.9". The recipient lands on the same puzzle and sees your scores in a challenge banner. They can't see your audio (we never have it) — they just see the numbers and the date.
Your audio never leaves your device. The microphone opens for the 3-second recording window and closes immediately after. The recording is processed locally to compute your score; it is never uploaded.
When you complete a day's three rounds, we send your three scores and the date to our server so we can understand whether the scoring is fair across players. Nothing else. No audio, no IP-address logging on disk, no fingerprints. Per-device opt-out toggle in the About panel. Full detail on the privacy page.
honk & whistle is in alpha. Things may change, break, or improve. If you want to follow along or have ideas, drop a line at thegreatsuperbobo@gmail.com.