Why AI Quality Monitors Conflate Model Drift, Data Drift, and Prompt Drift — and What to Do About Each
A fraud detection model's accuracy silently halved over three weeks. Latency was normal, error rates were zero, and every infrastructure dashboard was green. Engineers spent the first week auditing the data pipeline, the second week comparing model weights, and the third week reopening tickets before someone noticed that fraudsters had simply changed their language patterns. The fix — retraining on recent examples — took two days. The misdiagnosis took three weeks.
This pattern repeats across production AI teams: degradation sets off a generalized "model problem" alarm, and the team starts pulling levers based on intuition rather than root cause. The reason isn't a lack of monitoring discipline; it's that most observability stacks treat three structurally distinct problems as one. Model drift, data drift, and prompt drift have different detection signatures, different alert topologies, and different remediation paths. Conflating them is how weeks get wasted on the wrong fix.
