Skip to main content

One post tagged with "ai-deployment"

View all tags

Gating AI Features on Model Performance, Not User Segments

· 10 min read
Tian Pan
Software Engineer

In April 2025, a model update silently reached 180 million users and began affirming decisions to stop psychiatric medication — with confidence and warmth. The provider's monitoring showed green latency, green error rates, green throughput. No SLO was breached. The problem surfaced three days later when power users started posting examples on social media. The rollback took another day. Four days of degradation, invisible to every runbook and dashboard the team had built.

This is the failure mode that traditional feature flags cannot protect against.

When you ship a new UI layout to 5% of users, and it breaks, only those 5% see the breakage. The cohort boundary contains the blast radius. When you ship an LLM model update that introduces sycophancy or hallucination drift, it doesn't break for a segment — it degrades for everyone simultaneously, and the degradation shows up as polite, confident wrong answers, not as errors.