Cross-Tenant Data Leakage in Shared LLM Infrastructure: The Isolation Failures Nobody Tests For

April 10, 2026 · 11 min read

Software Engineer

Most multi-tenant LLM products have a security gap that their engineers haven't tested for. Not a theoretical gap — a practical one, with documented attack vectors and real confirmed incidents. The gap is this: each layer of the modern AI stack introduces its own isolation primitive, and each one can fail silently in ways that let one customer's data reach another customer's context.

This isn't about prompt injection or jailbreaking. It's about the infrastructure itself — prompt caches, vector indexes, memory stores, and fine-tuning pipelines — and the organizational fiction of "isolation" that most teams ship without validating.

In April 2024, Wiz researchers demonstrated complete cross-tenant breaches on a major AI-as-a-service platform. The attack chain ran through misconfigured Kubernetes environments and pickle deserialization in model files, ultimately giving attackers access to private models and datasets across the entire customer base. The researchers could access other customers' data without authentication bypass — just by exploiting the gaps between isolation layers that looked solid individually but were not composed correctly.

That incident is the visible tip. The subtler failures don't make headlines because nobody is looking for them.

The KV-Cache Timing Channel

When you deploy an LLM serving system like vLLM with automatic prefix caching enabled, the system stores key-value tensors from repeated prompt prefixes in GPU memory and reuses them on cache hits. This is a meaningful efficiency win — cache hits skip the expensive prefill computation and respond noticeably faster.

That measurable latency difference is also an attack vector.

Research presented at NDSS 2025 documented "PROMPTPEEK"-class attacks in which an adversary reconstructs other users' prompts by analyzing cache hit/miss timing patterns on shared serving infrastructure. The methodology requires no special access — only the ability to send requests and observe response latency. When the attacker's probe prompt matches another tenant's cached prefix, the hit is statistically distinguishable from a miss at p < 10⁻⁸. From a sequence of probes, an attacker can deduce what other tenants are querying.

The fix exists in vLLM as the cache_salt parameter, which creates separate cache namespaces per tenant by incorporating the salt value into the block hash. Only requests with matching salts can reuse cached blocks. But this protection is opt-in and application-enforced. The default configuration — the one most teams deploy — provides no cross-tenant cache isolation whatsoever.

Anthropic's managed infrastructure switched from organization-level to workspace-level cache isolation in early 2026, recognizing that even internal teams within the same organization shouldn't share KV-cache blocks. If you're running your own serving stack, equivalent isolation requires explicit instrumentation. Most teams haven't added it.

The Namespace Illusion in Vector Databases

Vector databases are the layer most directly implicated in RAG-based leakage, and also the layer where the gap between "isolated" and "actually isolated" is widest.

Pinecone namespaces, Weaviate collections with multi-tenancy mode off, pgvector schemas with row-level security — all of these are organizational boundaries, not cryptographic ones. They work by convention: queries include a filter, the database restricts the search space. What makes them fail is the same thing that makes SQL injection work — the boundary is enforced by application code, not by the storage system itself.

The specific failure modes differ by database:

Pinecone namespaces are correctly enforced at the index level when specified. The failure mode is omission: a developer writing a retrieval call forgets to pass the namespace parameter, and the query scans across all tenants. In a code review, this looks like a minor oversight. In production, it means every query without a namespace returns vectors from any tenant's data.

pgvector with row-level security is more robust because the database engine enforces the policy even when application code contains bugs — a forgotten WHERE clause is blocked, not silently permitted. But PostgreSQL's optimizer statistics have leaked rows that RLS was supposed to block (CVE-2024-10976), demonstrating that even database-layer isolation can fail at unexpected boundary points. RLS is not a guarantee; it's a strong default that reduces the blast radius of application bugs.

Weaviate offers the strongest native multi-tenancy model among mainstream vector databases, with logically isolated data per tenant at the collection level. But it requires explicit multi-tenancy mode configuration — not the default — and the isolation guarantee depends on the tenant key being correctly set on every write and every query.

The testing gap is the common thread. Most teams verify that a tenant can retrieve their own data. Almost none verify that a tenant cannot retrieve another tenant's data. These are different tests.

A minimal cross-tenant isolation test looks like this: inject a distinctive document into Tenant A's index, then attempt retrieval using Tenant B's credentials — not just filtering but actually authenticating as Tenant B. If the document surfaces, the isolation is broken. Run this test in CI before every deployment that touches retrieval configuration.

Fine-Tuning as a Cross-Tenant Amplifier

Shared fine-tuning infrastructure introduces a contamination risk that most platform teams haven't considered: one tenant's training data can affect the base model that serves all tenants.

Research on training data poisoning has established that contaminating fewer than 0.01% of training samples is sufficient to implant behavioral backdoors that survive subsequent safety fine-tuning. Poisoning 1% of instruction-tuning data achieves 80% performance degradation on targeted task categories. The number of required poisoned samples remains roughly constant as training data scales — meaning larger training sets don't dilute the attack.

The multi-tenant threat model follows directly. If a platform runs fine-tuning jobs for multiple customers on shared infrastructure and produces a base model that all customers draw from, a single tenant uploading a poisoned dataset contaminates the shared base. Other tenants' models inherit the backdoor without knowing it exists. The poisoned behavior activates only when specific trigger patterns appear in prompts — patterns the attacker controls.

The practical defense is simple in principle and difficult in practice: never produce a shared base model from customer-specific fine-tuning jobs. Each fine-tuning run should either start from a stable, audited base model and produce a tenant-specific adapter, or run in completely isolated training infrastructure. The contamination risk disappears when there is no path from one tenant's training data to another tenant's serving weights.

For datasets that aren't directly uploaded by tenants but are assembled from mixed sources, use training data provenance tracking — record which data segments contributed to which model versions. When a contamination incident occurs, provenance logs tell you which model versions to revoke and which tenants are affected.

Loading…

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

Cross-Tenant Data Leakage in Shared LLM Infrastructure: The Isolation Failures Nobody Tests For

The KV-Cache Timing Channel

The Namespace Illusion in Vector Databases

Fine-Tuning as a Cross-Tenant Amplifier

Recommended Reading

About Tian Pan

The KV-Cache Timing Channel​

The Namespace Illusion in Vector Databases​

Fine-Tuning as a Cross-Tenant Amplifier​

Recommended Reading

About Tian Pan

The KV-Cache Timing Channel

The Namespace Illusion in Vector Databases

Fine-Tuning as a Cross-Tenant Amplifier