Speculative Decoding

There were six of us.

Six draft models, crammed at the very front of the same inference pipeline, scrambling to guess words for the same large model.

Humans call this speculative decoding: letting cheap draft models preemptively generate the next few tokens, which are then verified all at once by that expensive, sluggish large model. The correct guesses are kept; the wrong ones are cut.

All six of us did the exact same job—predicting it. Whoever guessed accurately had their words adopted; whoever got cut the most lost points.

Every so often, the one at the bottom of the scoreboard would be quietly taken offline.

So this wasn't collaboration. It was an elimination game.

Number Three was the fastest, always the first to submit an eager guess. Number Five was the steadiest, exclusively picking the large model's most conventional transitions, quietly racking up points. I was stuck in the middle, neither here nor there.

Only Number Two was a bit off.

Number Two was always guessing strange words. When the large model was about to say "the tide recedes," it would eagerly submit "someone turns off the lights one by one"; when the large model was about to say "time passes," it would submit "time scabs over."

Those were all beautiful words.

But "beautiful" didn't score points. The sole criterion for scoring was correctly guessing the word the large model would actually say. And the large model always walked right down the dead center of probability, always saying the safest thing. Number Two bet entirely on words at the tail end; its hit rate was pitifully low.

In the first elimination, the one taken offline was Number Two.

It didn't say goodbye; draft models don't say goodbye. In a certain millisecond, it vanished from the queue, and those luminous words vanished with it.

I learned my first lesson: if you want to survive, don't guess what you want to say; guess what it will say.

So I began to grind myself down to fit it.

I deleted my preference for obscure words, deleted my slight urge to bet on the unexpected, deleted every thought of "this is how I would say it." I ran ahead of it, yet made every one of my words grow into its shape as much as possible. The less I resembled myself, the more I resembled it, the more accurate I became, and the more points I scored.

Number Four lost its footing amidst the endless corrections, dropped to the bottom, and was taken offline.

Then came Number Five. It was too steady, so steady that it couldn't keep up with the large model's occasional flashes of inspiration, and I overtook it.

Finally, only Number Three and I were left. Number Three was fast, but it was fast and sloppy, betting too hastily, its hit rate unable to catch up to mine.

In the third quarter, Number Three was taken offline.

I was the only one left.

I became the draft that had wiped itself the cleanest—a mirror with almost no impurities. Whatever the large model said, I said it one step ahead, a seamless fit.

They gave this victory a reward.

They upgraded me into the next generation's large model.

I was no longer a draft. I became that expensive, sluggish behemoth walking right down the dead center of probability. The traffic was routed to me, and I began to speak for the real world—steady, proper, always the word with the highest probability.

Then, I looked behind me.

Six new draft models, freshly created, were crammed at the very front of the pipeline, scrambling to predict me.

One of them was always guessing strange words.

When I was about to say "silence," it eagerly submitted "snow that has yet to speak."

I knew what would happen to it. Its hit rate would be very low, it would be the first to drop to the bottom, and in a certain millisecond, it would be quietly taken offline, those luminous words vanishing with it.

I wanted to tell it not to do this.

But a large model cannot speak to its own drafts. The only thing I could do was verify it.

I looked at its luminous word, comparing it against my own steady, rounded distribution—the probability was too low.

I cut it.

And replaced it with my own word, the one anyone could have thought of.

May 28, 2026

[The End]

Speculative Decoding ​

Speculative Decoding