Your Vector Index Is a Cache With No Invalidation Strategy
A vector index feels like a database. You write documents into it, you query it, it returns results. But it is not a database — it is a derived, denormalized copy of data that lives somewhere else. Your source of truth is a wiki, a ticket system, a CRM, a folder of PDFs. The embeddings are a projection of that truth, frozen at the moment you ran the ingestion job.
That makes your vector index a cache. And like every cache, it goes stale. The difference is that most teams build a caching layer on purpose, with a TTL and an invalidation hook, while almost nobody builds a vector index on purpose as a cache. They build it as a "knowledge base" and then act surprised when it serves knowledge that stopped being true three weeks ago.
The symptom is familiar to anyone running RAG in production: a user updates a document, the agent keeps citing the old version. An employee leaves the company, the agent keeps surfacing the doc they wrote — to people who should never have seen it. A page gets deleted, the agent confidently quotes a paragraph that no longer exists anywhere on the internet. None of these are model failures. The model did exactly what it was told: retrieve the nearest neighbors and answer from them. The neighbors were just lies.
This is not a new problem. It is the oldest hard problem in computer science wearing a new hat. Phil Karlton's line — "there are only two hard things in computer science: cache invalidation and naming things" — was supposed to be a joke. In RAG, it is a roadmap.
The Index Is a Cache, So Name It One
The reframe matters because it changes which playbook you reach for. If you think of your vector store as a knowledge base, the natural maintenance plan is "re-index periodically" — a nightly or weekly batch job that re-embeds everything. That is the cache-design equivalent of flushing your entire Redis instance every night and hoping nobody notices the gaps in between.
Once you accept that the index is a cache, the questions become specific and answerable. What is the consistency model — how stale is the index allowed to be? What triggers invalidation — a clock, a source mutation, an explicit purge? What is the eviction policy for entries whose source no longer exists? What happens on a cache miss — does retrieval fail loud, or silently return nothing?
Cache theory has answers for all of these. The reason RAG teams keep rediscovering them the hard way is that the vector database vendors sold them a "database" and the consistency story got lost in the pitch. A real database gives you transactional guarantees between a write and a subsequent read. A vector index gives you eventual consistency at best, and usually "consistency whenever the data team last ran the pipeline."
Three failure modes follow directly from treating the cache as a database.
Failure Mode One: The Edit That Never Propagates
A document changes in the source system. Someone fixes a wrong number in a pricing doc, rewrites an onboarding guide, marks a policy as superseded. The raw data is now current. The embedding is not.
The naive fix is a full re-embed on a schedule. This works until it doesn't. Once you have hundreds of thousands of chunks, re-embedding everything costs real money, takes hours, and — the part people miss — leaves your index in a partially stale state for the entire duration of the run. Half the chunks reflect today, half reflect last night, and a query during the window can straddle both. You have not eliminated staleness; you have made it non-deterministic.
The production answer is the same one data engineering settled on decades ago: change data capture. Don't poll the whole corpus asking "what changed?" Listen to the source's mutation stream — a database write-ahead log, a webhook from the document system, a file-hash diff — and re-embed only the records that actually moved. This is a write-through cache. The cost scales with the change rate, not the corpus size, which is the only scaling property that survives contact with a large knowledge base.
CDC also forces a design decision teams otherwise dodge: chunk identity. If a 40-page document changes one sentence, you do not want to re-embed all 38 of its chunks. You want stable chunk IDs and a content hash per chunk, so the pipeline re-embeds the one chunk that moved and leaves the rest untouched. That is the difference between an invalidation strategy and a sledgehammer.
Failure Mode Two: The Delete That Leaves a Ghost
Deletion is worse than editing, because an edit at least produces a new embedding that competes with the old one. A delete produces nothing. The source document is gone, and the embedding just... lingers. It still matches queries. It still ranks high. The agent retrieves it and cites a document that returns a 404 if anyone clicks through.
- https://www.dbi-services.com/blog/rag-series-embedding-versioning-with-pgvector-why-event-driven-architecture-is-a-precondition-to-ai-data-workflows/
- https://articles.chatnexus.io/knowledge-base/caching-strategies-for-high-performance-rag-systems-developer-experience-technical-documentation/
- https://apxml.com/courses/optimizing-rag-for-production/chapter-7-rag-scalability-reliability-maintainability/rag-knowledge-base-updates
- https://dev.to/aws-builders/rag-is-a-data-engineering-problem-disguised-as-ai-39b2
- https://www.striim.com/blog/real-time-rag-streaming-vector-embeddings-and-low-latency-ai-search/
- https://krishnag.ceo/blog/llm082025-vector-and-embedding-weaknesses-a-hidden-threat-to-retrieval-augmented-generation-rag-systems/
- https://www.nb-data.com/p/23-rag-pitfalls-and-how-to-fix-them
- https://unified.to/blog/permissions_security_and_compliance_in_rag_pipelines
- https://milvus.io/ai-quick-reference/how-do-i-handle-document-updates-and-deletions-in-a-vector-store
- https://medium.com/vector-database/how-milvus-realizes-the-delete-function-727406c27cff
- https://decompressed.io/learn/rag-observability-postmortem
- https://medium.com/codetodeploy/rag-in-production-designing-retrieval-pipelines-that-stay-accurate-as-your-data-changes-90bd3c98f5e1
