How it works

Pick a sound. Hear it. Make the same sound with your voice in three seconds. Get a score from 0 to 10. Three rounds make a daily puzzle.

The score, in two halves

They're equal-weighted. Together they catch the two things humans actually judge when they say "you nailed it" or "not quite."

If you tap "Show details" under a score, you'll see both halves separately. A high rhythm with a low timbre means you got the timing right but the colour of your voice is different — that's normal and counts as a real attempt.

Timbre — the colour of the sound

Timbre is what makes a violin and a flute playing the same note sound different. It's the spectrum of frequencies present, the way they evolve over time, the harshness or smoothness of the texture.

We use a pretrained neural network — Google's YAMNet, trained on AudioSet — to convert each clip (the reference and your recording) into a 1024-dimensional vector that captures its acoustic fingerprint. We compare the two vectors using cosine similarity. Identical clips give 1.0; unrelated clips give close to 0. We map that to 0–10.

The model runs entirely in your browser. The weights (about 16 MB) are downloaded once on your first visit and cached. Your audio never leaves your device.

Rhythm — the shape over time

Rhythm is everything timbre throws away: when does the sound start? When does it fade? Are there gaps? Is it a sharp tap or a long swell?

We compute the loudness envelope (RMS over short windows) of both clips, then compare them with dynamic time warping — a way of measuring how similar two patterns are even if they don't line up perfectly in time. Identical envelopes give 1.0; mismatched ones give close to 0.

The rhythm dimension is what separates a deliberate "wooop" from a flat hum, or a quick "tk" from a sustained drone — even when the timbre dimension can't really tell them apart.

Why does the bell score so low?

The bell reference is a synthesised pure tone at 880 Hz with a sharp attack and a long decay. No human voice can reproduce that timbre. Pure sine waves have no overtones; human voices have lots. The timbre dimension caps out around 6 or 7 even on a perfect attempt.

That's why the daily puzzle picks one easy, one medium, one hard sound: the hard one is honest about the ceiling. The friendlier sounds — Hum, Boing, Whoop — are vocally reachable and can score in the high 8s or 9s with a good attempt.

The daily puzzle

Every day, three sounds are picked deterministically from the date — same date, same set, everywhere in the world. There's no server picking; the date is the seed. The puzzle resets at your local midnight (matches Wordle / NYT Games).

Your progress for the day is stored on your device — refreshing or coming back later resumes you at the right round. Different days have separate slots, so a friend's link to yesterday's puzzle won't clobber today's progress.

The share link

So ?s=20260507859879 means "2026-05-07, scores 8.5 / 9.8 / 7.9". The recipient lands on the same puzzle and sees your scores in a challenge banner. They can't see your audio (we never have it) — they just see the numbers and the date.

What we collect

Your audio never leaves your device. The microphone opens for the 3-second recording window and closes immediately after. The recording is processed locally to compute your score; it is never uploaded.

When you complete a day's three rounds, we send your three scores and the date to our server so we can understand whether the scoring is fair across players. Nothing else. No audio, no IP-address logging on disk, no fingerprints. Per-device opt-out toggle in the About panel. Full detail on the privacy page.

What's next

honk & whistle is in alpha. Things may change, break, or improve. If you want to follow along or have ideas, drop a line at thegreatsuperbobo@gmail.com.

The game in one sentence

The score, in two halves

Timbre — the colour of the sound

Rhythm — the shape over time

Why does the bell score so low?

The daily puzzle

The share link

What we collect

What's next