Embedding Drift: The Silent Degradation Killing Your Long-Lived RAG System
Your RAG system is running fine. Latency is normal. Error rate is zero. But a user asking about "California employment law" keeps getting results about real estate — and your logs show nothing wrong.
This is embedding drift in action: the retrieval failure mode that doesn't throw exceptions, doesn't spike error rates, and doesn't show up in standard observability dashboards. It happens when your vector store accumulates embeddings produced under different conditions — different model versions, different chunking rules, different preprocessing pipelines — and the vectors start pointing in incompatible directions. The system keeps serving requests, but the semantic coordinates are no longer aligned, and retrieval quality erodes quietly over weeks or months.
