Skip to main content

2 posts tagged with "distillation"

View all tags

The Distillation That Lost a Capability Your Eval Suite Never Measured

· 9 min read
Tian Pan
Software Engineer

A team shrinks a 200B teacher into a 7B student because the eval suite — fifty thousand examples covering everything the product launched with — shows the student trailing the teacher by less than two points and inference cost dropping by an order of magnitude. The migration ships. The cost graph drops. The customer-satisfaction graph holds. Three weeks later, support starts seeing a class of failures the team cannot reproduce in eval.

The student no longer recognizes a corner-case input format the teacher had silently handled. It no longer recovers from a particular ambiguous instruction the teacher had reliably disambiguated. It no longer produces the rare-but-load-bearing "ask a clarifying question instead of guessing" behavior — because the eval set was scrubbed of ambiguous prompts on the grounds that they were "bad data."

The eval said the distillation was faithful. The eval was wrong about what faithfulness means.

Distillation Is a Product Decision, Not a Research Artifact

· 10 min read
Tian Pan
Software Engineer

A frontier-model chat feature is roughly a thirty-cents-per-conversation product. The distilled variant of the same feature is roughly a third-of-a-cent-per-conversation product. These are not two implementations of one product. They are two products, with different free-tier economics, different acquisition costs, different markets, and different competitive moats. The team that ships the distilled version as "the same feature, cheaper" wastes the move.

Most engineering organizations still treat distillation as a research-team optimization that gets applied after a feature is "done" — a tail-end pass to wring inference cost out of something already spec'd against the frontier model. That framing is wrong by an order of magnitude. The choice of teacher, the choice of student, the eval suite the student is graded against, and the product surface the student is deployed to are product decisions. They determine which capabilities you are consenting to lose, which traffic shape you are designing for, and which price floor you are unlocking. Hand them to a research team to optimize against MMLU and you will ship a model that wins benchmarks the product does not care about.