Skip to main content

2 posts tagged with "knowledge-distillation"

View all tags

Fine-Tune Orphan: Recovering Domain Expertise When the Base Model Is Deprecated

· 9 min read
Tian Pan
Software Engineer

On January 4, 2024, OpenAI retired the /fine-tunes endpoint. Every fine-tuned Ada, Babbage, Curie, and Davinci model stopped responding. Teams that had spent months building production systems on these models — careful prompt design, annotated datasets, labeling pipelines — woke up to HTTP 404s. The fine-tunes didn't migrate. The learned behaviors didn't transfer. The domain expertise was gone.

This wasn't a fringe edge case. Google followed in August 2024 by completely decommissioning the PaLM API, with zero backwards-compatible grace period. Unlike OpenAI, which at least let existing GPT-3.5 fine-tunes keep running while blocking new training runs, Google's shutdown meant production inference stopped the same day. If your fine-tuned PaLM model was in the critical path, you had a service outage.

Knowledge Distillation Economics: When Compressing a Frontier Model Actually Pays Off

· 11 min read
Tian Pan
Software Engineer

Most teams that reach for knowledge distillation do it for the wrong reasons and at the wrong time. They see a 70B model blowing their inference budget, read that distillation can produce a 7B student that's "just as good," and start immediately. Six weeks later they have a distilled model that scores well on their validation set, ships to production, and begins producing confident nonsense at scale. The validation set was drawn from the same distribution as the teacher's synthetic training data. Real traffic was not.

Distillation is an optimization tool, not a capability upgrade. The economics only work under specific conditions — and the failure modes are subtle enough that teams often don't detect them until users do.