Skip to main content

2 posts tagged with "llms"

View all tags

Knowledge Distillation for Production: Teaching Small Models to Do Big Model Tasks

· 9 min read
Tian Pan
Software Engineer

A healthcare company ran GPT-4 on 10,000 documents per day. Annual bill: 50,000.Afterfinetuninga27Bopensourcemodelonfrontieroutputs,thesameworkloadcost50,000. After fine-tuning a 27B open-source model on frontier outputs, the same workload cost 5,000—a 90% reduction. The smaller model also outperformed the frontier model by 60% on their specific task, because it had been shown thousands of examples of exactly the right behavior.

This is knowledge distillation in its modern form: you pay the frontier model API costs once to generate training data, then run a small specialized model forever. The math works because inference is cheap when you own the weights, and task-specific models beat general-purpose models on narrow tasks given enough examples.

But "collect outputs, fine-tune, ship" is not a complete recipe. Most teams that attempt distillation hit one of three invisible walls: bad synthetic data that teaches the student wrong behaviors, no reliable signal for when the student is actually ready, or silent quality collapse in production that doesn't surface until users complain. This post covers the pipeline decisions that determine whether distillation works.

Reasoning Models in Production: When They Help and When They Hurt

· 9 min read
Tian Pan
Software Engineer

A team building a support triage system switched their classification pipeline from GPT-4o to o3. Accuracy improved by 2%. Costs went up by 900%. The latency jumped from 400ms to 12 seconds. They switched back.

This is the most common story in production AI right now. Reasoning models represent a genuine capability leap — o3 solved 25% of problems on the Frontier Math benchmark when no previous model had exceeded 2%. But that capability comes with a cost and latency profile that makes them wrong for the majority of tasks in the average production system. Knowing the difference is one of the more valuable things an AI engineer can internalize right now.