Skip to main content

One post tagged with "knowledge-distillation"

View all tags

Knowledge Distillation Economics: When Compressing a Frontier Model Actually Pays Off

· 11 min read
Tian Pan
Software Engineer

Most teams that reach for knowledge distillation do it for the wrong reasons and at the wrong time. They see a 70B model blowing their inference budget, read that distillation can produce a 7B student that's "just as good," and start immediately. Six weeks later they have a distilled model that scores well on their validation set, ships to production, and begins producing confident nonsense at scale. The validation set was drawn from the same distribution as the teacher's synthetic training data. Real traffic was not.

Distillation is an optimization tool, not a capability upgrade. The economics only work under specific conditions — and the failure modes are subtle enough that teams often don't detect them until users do.