Your Embeddings Don't Know the Contractor Was Off-Boarded
A contractor finished a six-month engagement last quarter. HR ran the off-boarding checklist: SSO disabled, laptop wiped, GitHub seat removed, Slack archived, Notion access revoked. Compliance signed off. Six weeks later, an internal RAG assistant answered a question by quoting a confidential strategy document the contractor had authored — and the chunk it cited was still tagged with the contractor's user ID in the vector store's allow-list. Nothing in the access logs of the source-of-truth ever recorded a read, because there was no read. The retrieval came from a copy of the data that nobody wired into the off-boarding flow.
This is the structural problem nobody puts on the architecture diagram. Your vector index is not just a similarity-search engine. It is a permission cache — a derived store of who-can-see-what, frozen at the moment you ran your embedding job — and almost nobody is invalidating it the way they invalidate everything else.
The application database has a permissions table that changes constantly. The document store fires deletion events when a file is moved to a deprecated folder. The identity provider broadcasts group-membership changes to a webhook nobody subscribed to. Meanwhile, the embeddings sit in a separate system, with metadata that was scraped from the source's ACL state at ingestion time, and the only time that metadata changes is when the same chunk is re-embedded — which happens when content changes, not when permissions change. The two facts the system needs to keep in sync are owned by two different teams running on two different cadences, and the bug only surfaces when a similarity search happens to come close to content that should have aged out.
The Permission Cache Nobody Calls a Cache
Engineers will happily admit that the vector store is a cache of content. Most teams have a story for what happens when a document is edited — usually some flavor of re-embed, sometimes a tombstone, sometimes a nightly rebuild. What they don't have a story for is the permission state attached to that content.
When you embed a document, you copy three things into the vector store: the chunk text (sort of — it becomes a vector), the chunk metadata, and crucially the permission-shaped fields in that metadata. The owner ID. The group list. The sensitivity label. The tenant scope. These fields were correct at ingestion time. From then on, they are a snapshot. The contractor was in the allow-list group when the embedding ran. The contractor is no longer in that group in the IdP. The vector store does not know.
The standard advice — enforce ACL filters at retrieval time, not in post-processing — assumes the ACL filter is being evaluated against current state. In practice, what gets evaluated is whatever metadata was indexed alongside the vector. Unless you do something deliberate to fix this, your "permission-aware retrieval" is permission-aware as of the last time you ran the embedding pipeline.
There is a tempting workaround: enforce permissions in the application layer, after retrieval, by re-checking each returned chunk's source document against the live ACL. This narrows the leak but does not close it. The chunk still passed through your retrieval system, which means it appeared in your logs, may have informed re-ranking, and — most importantly — was loaded into the application process memory of a user who should not have been able to materialize it. If any of that surface area logs the chunk text (most observability stacks do, by default), you have just written the contractor's confidential document into a debug log under the requesting user's session.
Retrieval Is a Side Channel That Survives Deletion
Treat the vector store as a side channel. It reads from data that lives somewhere else, and it answers questions that aren't gated by the deletion semantics of the source. A row gets soft-deleted from Postgres; the embedding stays. A file is moved to a "deprecated" folder in the document management system; the embedding stays. A user is removed from a project group in the IdP; the embedding's allow-list metadata stays. A tenant offboards from your SaaS; the embeddings — buried in a shared multi-tenant index — stay.
Now imagine the audit timeline of the most expensive version of this bug. The right-to-be-forgotten request came in last quarter. Legal logged it, the data team ran the deletion script against the production database, the row was removed, a confirmation went back to the user. Compliance closed the ticket. The vector index, sitting on a separate service operated by a separate team, was not part of the script. Six months later, the assistant answers a question by quoting that deleted content, the user notices their name appearing, and you now have a regulatory incident that your deletion logs cannot defend you from — because your logs say the data is gone, and the data isn't.
The retrieval log shows the chunk was returned. The source-of-truth log shows the row was deleted. The two logs disagree. Under most modern data-protection regimes, the source-of-truth being clean is not enough — the derived stores have to follow. The vector index is a derived store. It needs its own deletion pipeline, with its own confirmation, and it needs to be on the same ticket as the database deletion, not in a separate queue that someone will get to.
Why Per-Tenant Namespaces Solve Only the Easy Case
The cleanest answer to multi-tenant leakage is to give each tenant its own namespace, index, or collection — physical isolation, not logical filtering. When the tenant offboards, you drop the namespace, and there is nothing to leak. This is genuinely good design and it solves the tenant-offboarding case completely.
It does not solve the within-tenant case, which is where most permission bugs actually live. Inside a tenant, you have users, groups, roles, and documents with overlapping ACLs. The contractor was a user inside the tenant. Their off-boarding does not drop the namespace; it changes their group memberships and their per-document grants inside it. The namespace boundary is the wrong granularity.
A second pattern — physical isolation per sensitivity tier — has the same property. Splitting "public" and "confidential" into separate indices is helpful, but the confidential index still contains documents whose internal ACLs have drifted from the IdP.
The honest answer for intra-tenant ACL changes is that you need a propagation pipeline: an event stream from the IdP and the document store, with handlers that update or invalidate the affected embeddings within minutes, not nightly batches. This is the part teams skip because it is real infrastructure work and the failure mode is invisible until someone notices.
Tombstones, Hard Deletes, and the Semantics You Have to Pick
When a document is deleted at the source, you have three choices in the vector store. You can hard-delete the embedding — fast retrieval, no further surface area, but you lose the ability to audit what was there. You can tombstone it — leave a marker that filters at query time so nothing returns, but keep the vector around so you can recover or audit. Or you can do nothing and rely on application-layer filtering, which we established is broken under stale-metadata conditions.
These are not interchangeable. Hard delete is the right default for GDPR-shaped requests, where the legal posture is that the data must not be retrievable by any means. Tombstones are the right default for soft-delete semantics in the source system, where the data still exists conceptually and might come back. Mixing them carelessly produces a vector index where some "deleted" content can still be queried because the tombstone-filter wasn't wired into every retrieval path.
The mistake nobody admits to making is tombstoning by default and never garbage-collecting. Tombstone tables grow without bound, and at some scale the filter cost dominates the query cost. Worse, the tombstone marker is metadata — and metadata can drift. A bug in the tombstone-filter clause silently exposes every soft-deleted chunk. The single most useful test in a vector-store security review is to delete a document and then query for content that is verbatim from that document; if it comes back, your tombstone story has a leak, regardless of how confidently the runbook describes it.
What Closing the Gap Actually Looks Like
The cluster of practices that make this problem tractable is unromantic:
-
Treat the embedding pipeline as a subscriber to permission events, not as a periodic job that scrapes the latest state. Group-membership changes, role revocations, document moves, and user deletions are events the IdP and the document store already emit. Subscribe to them. Build the consumer that propagates them into the vector index. Measure the lag.
-
Make retrieval-time ACL enforcement go to the source-of-truth, not the indexed metadata. Yes, this costs a hop. Yes, it complicates query latency. The alternative is a permission-aware system that is permission-aware only as of the last embedding job. For most enterprise use cases the latency cost is fine; for high-throughput consumer cases, cache the ACL with a short TTL and accept the tradeoff explicitly.
-
Treat deletion as a multi-store operation by default. The deletion ticket should not be closed when the source row is gone. It should be closed when every derived store — vector index, search index, analytics warehouse, retrieval cache — has confirmed the deletion. Bake this into the script. The DPO will thank you.
-
Run periodic divergence audits. Pick a random sample of users whose access changed in the last 30 days. Issue queries that should have aged out. See if the vector store still returns the content. Treat any hit as a high-severity incident, not a curiosity.
-
Decide the soft-versus-hard delete posture once, write it down, and apply it consistently across all indices. The worst configuration is the implicit one where different teams interpret "delete" differently.
The deletion request your compliance team thinks they processed last quarter is still answerable by your RAG pipeline today, unless someone has done the unglamorous work of wiring the events. The vector index does not announce its staleness; the bug surfaces as a one-line quote from a confidential document in an answer that should never have been generated. By then the question is not "how do we fix the leak" — it is "how long has this been wrong, and how do we know."
Permissions move. Embeddings don't, unless you make them.
- https://www.pinecone.io/learn/rag-access-control/
- https://www.osohq.com/post/right-approach-to-authorization-in-rag
- https://supabase.com/docs/guides/ai/rag-with-permissions
- https://aws.amazon.com/blogs/machine-learning/implementing-knowledge-bases-for-amazon-bedrock-in-support-of-gdpr-right-to-be-forgotten-requests/
- https://learn.microsoft.com/en-us/azure/search/search-query-access-control-rbac-enforcement
- https://milvus.io/ai-quick-reference/how-do-i-handle-document-updates-and-deletions-in-a-vector-store
- https://www.we45.com/post/rag-systems-are-leaking-sensitive-data
- https://zilliz.com/blog/ensure-secure-and-permission-aware-rag-deployments
- https://www.sethserver.com/security/rag-vector-dbs-and-leaky-knowledge-bases.html
- https://authzed.com/blog/building-a-multi-tenant-rag-with-fine-grain-authorization-using-motia-and-spicedb
- https://www.cobalt.io/blog/vector-and-embedding-weaknesses
- https://repello.ai/blog/vector-embedding-security
