Poisoned at the Source: RAG Corpus Decay and Data Governance for Vector Stores
Your RAG system was working fine at launch. Three months later it's confidently wrong about a third of user queries — and your traces show nothing broken. The retriever is fetching documents. The model is generating responses. The pipeline looks healthy. The problem is invisible: every vector in your store still has a similarity score, but half of them are pointing to facts that no longer exist.
This is corpus decay. It doesn't throw errors. It doesn't trigger alerts. It accumulates quietly in the background, and by the time you notice it through user complaints or quality degradation, your vector store has become a liability.
