Skip to main content

Tenancy Leaks Through Few-Shot Examples: When Your Prompt Library Becomes a Cross-Customer Data Store

· 11 min read
Tian Pan
Software Engineer

Open the production system prompt of a maturing AI product, scroll past the role description, and you will almost always find a section labeled # Examples or ## Few-shot demonstrations. The examples are excellent — they are concrete, they are domain-specific, they pattern-match exactly the failure modes the eval set was struggling with last quarter. They are also, on closer inspection, real customer data. A real ticket ID from a real account. A phrasing pattern lifted verbatim from a support thread. An internal product code that one tenant uses and the rest of the customer base has never heard of.

The team that put them there is not careless. The examples got into the prompt the way good examples always get into prompts: someone mined production traces for cases the model handled poorly, picked the cleanest worked example, pasted it into the system message, watched the eval scores climb, and shipped. That pipeline — production trace to system prompt — is the most reliable prompt-improvement loop in modern LLM engineering. It is also a structural cross-tenant data leak that the team built without noticing, and the system prompt has quietly become a multi-tenant data store the data-processing agreement never priced.

The mining pipeline that leaks

The leak does not start with a careless engineer. It starts with a feedback loop that everyone shipping LLM features eventually adopts because the alternative is worse. Synthetic examples written from scratch are bland; the model pattern-matches them and produces equally bland outputs. Real production traces carry the texture of how users actually phrase requests, what edge cases actually arrive, what failure modes actually need teaching. So the prompt-improvement cycle settles into a familiar shape: look at the failed evals from last week, find three or four cases where a small example would have nudged the model right, copy them into the prompt, A/B test, ship.

What changes between "copy the example" and "ship" is supposed to be a scrubbing step. In practice that step often does not exist, and where it exists, it is a one-pass review by the engineer who wrote the prompt — the same engineer who is not on the security or legal team, who does not know that "Acme Logistics" is the name of a specific real customer, who does not know that the ticket ID format is reversible to a specific account, who does not know that the phrasing pattern in the example is distinctive enough that the original customer would recognize it at a glance.

The system prompt then ships to every tenant. Every session, regardless of which customer is logged in, reads those few-shot examples as part of its context. The model is now exposed to tenant A's data on every request from tenant B, tenant C, and every other tenant. The exposure is not theoretical — it is encoded into the architecture of how the prompt is loaded. The only question is whether the model surfaces what it has been shown, and when.

Why the model recites

In-context learning is, mechanically, the model being shown a small dataset of input/output pairs and asked to reproduce the pattern. The pattern includes not just the structural shape (a question follows the prefix User:, an answer follows Assistant:) but the lexical content of the examples themselves. When a new query arrives that is structurally similar to one of the examples, the model has been trained — across pretraining, instruction tuning, and the conditioning of the prompt itself — to draw on the example's tokens as the closest available reference.

The result is that few-shot examples leak through several channels at once, with varying degrees of subtlety:

  • Verbatim recital under similar queries. If the new tenant's question is phrased close enough to the example's input, the model often emits an answer that lifts named entities directly from the example. A customer name, a ticket number, a product SKU that has no business appearing in the new tenant's session ends up in the response.
  • Laundered paraphrase that preserves identifying signal. Even when the model rewrites the example into new words, the underlying facts — the unusual industry vocabulary, the specific workflow shape, the named entities one synonym removed — are recognizable to anyone who originally produced them. Paraphrase is not anonymization; it is plausible deniability for the legal team and forensic evidence for the customer who notices.
  • Distribution shift in tone and structure. Tenant A's communication style — the formal register, the bullet-list habit, the love of em-dashes — bleeds into every response delivered to tenant B. Cross-session leak research has shown that this is one of the most detectable cross-tenant signals, because consistency of style is a recognizable fingerprint.
  • Extraction via crafted prompts. A motivated attacker, including a benign curious customer, can probe the prompt by asking questions designed to elicit the few-shot content directly. KV-cache sharing research has demonstrated that even side-channel attacks can reconstruct prompt content; in-context recital is the much easier vector that does not need cache timing analysis.

The architectural realization is uncomfortable: the system prompt is functionally part of the model's working memory for every request, and anything pasted into it has been added to a shared substrate that the model is actively trained to reproduce when prompted to.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates