Skip to main content

One post tagged with "statistical-process-control"

View all tags

Continuous Production Eval: Statistical Quality Monitoring for Live LLM Traffic

· 9 min read
Tian Pan
Software Engineer

Most teams treat LLM quality evaluation as a pre-deployment gate: run your eval suite, check the scores, ship. That approach catches roughly 40% of the failures your users will actually see. The rest slip through because production traffic looks nothing like your eval set — different query distributions, different session lengths, different upstream data, different model behavior under concurrent load. By the time a user complaint surfaces, the problem has been happening for days.

The fix is not more evals before deployment. It is continuous evaluation against live traffic, designed around the reality that you have no ground truth labels at inference time and need actionable signal within minutes, not weeks.