The Privacy Architecture of Embeddings: What Your Vector Store Knows About Your Users
Most engineers treat embeddings as safely abstract — a bag of floating-point numbers that can't be reverse-engineered. That assumption is wrong, and the gap between perception and reality is where user data gets exposed.
Recent research achieved over 92% accuracy reconstructing exact token sequences — including full names, health diagnoses, and email addresses — from text embeddings alone, without access to the original encoder model. These aren't theoretical attacks. Transferable inversion techniques work in black-box scenarios where an attacker builds a surrogate model that mimics your embedding API. The attack surface exists whether you're using a proprietary model or an open-source one.
This post covers the three layers of embedding privacy risk: what inversion attacks can actually do, where access control silently breaks down in retrieval pipelines, and the architectural patterns — per-user namespacing, retrieval-time permission filtering, audit logging, and deletion-safe design — that give your users appropriate control over what gets retrieved on their behalf.
Why Embeddings Are a Different Kind of Privacy Problem
A traditional encrypted database has a clean mental model: encrypt at rest, decrypt on authorized access, audit the decryption events. The data is opaque until it isn't.
Embeddings don't work this way. When you vectorize a document, you create a lossy transformation that encodes linguistic and behavioral patterns derived from the original content. Those patterns persist in the vector. A nearest-neighbor query doesn't decrypt anything — it just computes distance — but it can reveal that a specific person's medical history is in your index, or that an employee with a specific name exists in your HR documents.
The problem compounds in three ways:
- Encoding is not encryption. A 1536-dimensional float vector from a text embedding model retains semantic structure. Attackers can extract sensitive attributes — nationality, occupation, birthdate, medical diagnoses — via cosine similarity comparisons without labeled training data, achieving over 94% accuracy on some attribute categories.
- Permissions get stripped at ingestion. When you ingest documents from SharePoint, Confluence, or Google Drive into a vector store, the original ACL metadata is almost never preserved. The document becomes queryable by everyone.
- Deletion is structurally hard. With a relational database, deleting a record is a SQL
DELETE. With embeddings, there's no clean mapping between a user's data and which vectors were influenced by it. GDPR Article 17 gives you 30 days. Most teams have no tested deletion procedure at all.
The Attack Surface: Inversion, Membership Inference, and Probing
Understanding what attackers can actually do shapes what defenses are worth building.
Embedding inversion attacks are the most severe. The attack trains a model to reverse the embedding operation — reconstructing original text from the vector. A 2024 paper demonstrated that these attacks are transferable: an attacker who builds a surrogate model from publicly available embeddings can apply it to a target system they've never directly accessed. The practical implication is that any embedding you serve via API — even a truncated or noisy version — can be a target.
Nearest-neighbor probing exploits the geometry of embedding space. An attacker sends crafted queries and observes similarity scores. If querying "employee John Smith" returns a suspiciously high score, you've confirmed that name appears in the corpus, even if the actual content is never returned. This is a dictionary attack on your vector index, and it requires no special access — just API calls.
Membership inference asks a subtler question: was this specific document included in the index? Attackers use the statistical properties of retrieval outputs to infer presence or absence. In healthcare RAG systems, this can be enough to reconstruct patient identifiers and diagnoses even without ever seeing the source documents.
None of these attacks require compromising your infrastructure. They exploit the embedding interface itself.
Where Access Control Breaks Down in RAG Pipelines
The standard RAG pipeline has a structural access control gap: similarity search runs against all vectors in the index, regardless of what permissions apply to the underlying documents.
Consider a company that builds an internal knowledge assistant over Confluence. Marketing, Engineering, Finance, and HR all have documents in the same Confluence instance. The assistant ingests all of them. An employee in Marketing asks a question. The retrieval step computes cosine similarity against every document in the index — including confidential compensation data from HR, unreleased financial projections, and engineering security designs. If those documents happen to be semantically relevant to the query, they get returned.
The employee didn't navigate to a restricted page. They just asked a question. The system did the rest.
Three common failure modes cause this:
No retrieval-time filtering. The vector query runs, results come back, and only then does the application layer check whether the user can read them. But by that point, the retrieval has already happened. Post-hoc filtering also tends to degrade quality: if you retrieve the top 20 documents and then filter 15 out, the LLM is working with leftover context.
Namespace misuse. Some teams create isolated vector store instances per department — one for HR, one for Finance, one for General. This works but it's operationally expensive and doesn't handle cross-cutting documents. It also doesn't scale to per-user or per-document granularity.
Overly broad source permissions. Most organizations already have permission problems in their source systems — shared drives where too many people have read access because no one audited the ACLs. RAG amplifies this by making discovery automatic. Traditional file access requires someone to know a document exists. RAG makes everything discoverable through natural language.
Architectural Patterns That Actually Work
Per-User Namespacing and Shard Isolation
The cleanest model for multi-tenant RAG is strict physical isolation per tenant. Weaviate's multi-tenancy architecture gives each tenant a dedicated shard with independent storage, vectors, inverted indexes, and metadata. Operations on one tenant cannot touch another's shard. Deletion is straightforward: dropping a tenant deletes its shard. The system scales to 50,000+ active tenants per node.
Pinecone's namespace model provides logical partitioning — sufficient for many use cases, but without the physical isolation of shard-per-tenant. Qdrant's payload-based filtering applies access logic at query time using metadata fields, which offers flexibility at the cost of relying on query-time enforcement rather than architectural separation.
For applications where users own their own documents — a notes app, a document Q&A product — shard isolation is the right default. The cost is some index fragmentation at scale, but the access control guarantees are clean.
Retrieval-Time Permission Filtering
For applications where documents have complex, overlapping permissions — enterprise knowledge bases, for example — per-user sharding becomes impractical. A document readable by members of multiple teams can't live in a single tenant shard.
The right pattern here is pre-filter retrieval using a relationship-based access control (ReBAC) system. Before running similarity search, query your authorization system to get the set of document IDs the requesting user can access. Pass that set as a filter to the vector query. The similarity search runs only against authorized documents.
This requires your authorization system to answer "which resources can this user read?" efficiently — a query that traditional RBAC systems often don't optimize for but dedicated ReBAC systems (SpiceDB, OpenFGA, Zanzibar-derived systems) are designed to handle.
The alternative is post-filter retrieval: retrieve candidates, then check permissions. This works when hit rates are high — when most retrieved documents will pass the permission check. When hit rates are low (a user can access a small fraction of the corpus), post-filtering is both slow and quality-degrading.
PostgreSQL with pgvector and row-level security is a practical middle path for simpler permission models. An RLS policy on the document sections table ensures that every similarity query is automatically scoped to rows the requesting user can read. No application-layer filtering required.
Audit Logging for Vector Retrievals
Traditional databases log reads at the row level. Vector stores rarely do this by default, which creates a compliance gap: you have no record of who retrieved what, or that a user ran queries that surfaced confidential documents.
What to log at minimum:
- User identifier
- Timestamp
- Query intent (not the raw query vector — hash or truncate it)
- IDs of documents returned
- Similarity scores
What not to log: raw vectors, raw query text if it contains PII, or full document content. The goal is auditability, not creating a second copy of the sensitive data.
Store audit logs in an immutable system separate from your application database — CloudWatch, Splunk, S3 with object lock. Database administrators should not be able to delete audit records. Forward logs at regular intervals rather than synchronously on each query to avoid adding latency to retrieval.
Anomaly detection on access patterns adds meaningful signal: a user who suddenly queries for documents outside their normal domain, or who issues hundreds of high-similarity queries in a short window, warrants investigation.
Designing for Deletion
If your product stores user-generated content that gets embedded, you need a tested deletion procedure before you ship, not after you receive a GDPR request.
The practical architecture involves three components:
Metadata linking. Every vector must store a reference to its source document ID. When a document is deleted, your pipeline cascades that deletion to all derived vectors. This sounds obvious but requires deliberate design: if you chunk documents before embedding, each chunk needs to carry the source document ID as payload metadata.
Synchronization pipelines. Deletions from your source system (S3, database, CMS) must propagate to your vector store. AWS Bedrock Knowledge Bases handles this automatically when you sync a knowledge base — deleted source files result in their vectors being removed on the next sync. If you're managing your own pipeline, you need equivalent logic: a deletion event in the source triggers a delete-by-filter query in the vector store.
Encryption with key deletion. For the highest-stakes data, encrypt vectors at rest with per-user or per-document keys. When a deletion request arrives, delete the key. The vectors become unrecoverable without physical deletion, which satisfies the spirit of the right to erasure while simplifying the operational process. This is the cleanest approach for regulated industries.
One important caveat: backups. If you snapshot your vector store, those snapshots contain vectors derived from data you may later be asked to delete. Either exclude personal data from snapshots, implement key-based encryption so backups are useless without keys, or build a procedure for auditing and cleaning backups on deletion.
The Minimum Viable Privacy Architecture
For most teams, the immediate priority is closing the three most common gaps:
-
Enforce retrieval-time filtering. Before shipping any RAG feature over multi-user data, ensure that similarity search is scoped to documents the requesting user can access. Pre-filter with your authorization system if the permission graph is complex; use RLS or namespace isolation if it's simpler.
-
Log who retrieved what. Enable audit logging on your vector store. Route logs to immutable storage. If your vector store doesn't support retrieval-level logging, add it at the application layer.
-
Test your deletion procedure. Write a test that creates a document, embeds it, deletes the source, and verifies the vector is gone. Run it before you ship. Run it in CI if you can.
Embedding inversion defenses — differential privacy, homomorphic encryption — are appropriate for healthcare, legal, and financial applications where embeddings may be exposed via API or handled by untrusted infrastructure. For most applications, architectural isolation and access control deliver more practical protection per unit of engineering effort.
The fundamental shift is treating your vector store not as a search index but as a database with users and permissions. That framing makes the right patterns obvious.
- https://arxiv.org/html/2404.16587v1
- https://arxiv.org/html/2411.05034v1
- https://aclanthology.org/2024.acl-long.230/
- https://ironcorelabs.com/ai-encryption/
- https://ironcorelabs.com/blog/2024/text-embedding-privacy-risks/
- https://www.securityium.com/a-guide-to-mitigating-llm082025-vector-and-embedding-weaknesses/
- https://www.pinecone.io/learn/rag-access-control/
- https://supabase.com/docs/guides/ai/rag-with-permissions
- https://www.lasso.security/blog/riding-the-rag-trail-access-permissions-and-context
- https://weaviate.io/blog/weaviate-multi-tenancy-architecture-explained
- https://qdrant.tech/articles/data-privacy/
- https://milvus.io/ai-quick-reference/can-surveillance-vector-databases-comply-with-gdpr-or-ccpa
- https://aws.amazon.com/blogs/machine-learning/implementing-knowledge-bases-for-amazon-bedrock-in-support-of-gdpr-right-to-be-forgotten-requests/
- https://www.shshell.com/blog/vector-db-module-16-lesson-5-audit-logging
- https://blogs.oracle.com/mysql/protecting-ai-vector-embeddings-in-mysql-security-risks-database-protection-and-best-practices/
