Skip to main content

Multi-User AI Sessions: The Context Ownership Problem Nobody Designs For

· 9 min read
Tian Pan
Software Engineer

In August 2024, security researchers discovered that Slack AI would pull both public and private channel content into the same context window when answering a query. An attacker in a public channel could craft a message that, when ingested by Slack AI, would inject instructions into a victim's session — and since Slack AI doesn't cite its sources, the resulting data exfiltration was nearly untraceable. The attack could leak API keys embedded in private DMs. Slack patched it after responsible disclosure.

This wasn't a bug in the traditional sense. It was a consequence of treating context as a shared mutable resource with no per-user access control. And it's a mistake that most teams building shared AI assistants are making right now, just more quietly.

When you build an AI feature for a single user, you mostly get away with not thinking about context ownership. The session belongs to one person; whatever ends up in the context window is theirs. But the moment you deploy a team Slack bot, a shared workspace assistant, or a live-collaboration AI layer, you've introduced a problem that authentication alone cannot solve: multiple users, multiple intents, and one context window that doesn't know who it belongs to.

Why Authorization at the App Layer Isn't Enough

Engineers tend to think about multi-user security in terms of authentication and authorization: check the JWT, verify permissions, then proceed. For traditional APIs, that mental model holds. For AI systems, it breaks down at the context layer.

Here's why: in most shared AI implementations, the context window is assembled once at request time and handed to the model. That assembly step pulls from conversation history, memory stores, retrieved documents, and current session state. If the assembly logic doesn't enforce per-user boundaries at each step, you get cross-contamination — and the model has no idea. It just reasons over whatever's in the window.

This is what Giskard's research calls a cross-session leak: the model returns valid data to the wrong user because the runtime failed to enforce boundaries before inference, not because the model itself misbehaved. Fixing it with output filters after the fact is like trying to un-ring a bell.

By the first half of 2025, Microsoft Copilot alone had exposed approximately 3 million sensitive records per organization through this class of failure — not because of broken authentication, but because the tool accessed shared organizational data stores without per-user scoping in the context assembly step.

Three Failure Modes That Show Up in Production

Context leakage between users is the most visible failure. User A's conversation history or memory leaks into User B's context. This happens when session state is stored by team or workspace ID instead of user ID, when conversation summaries get written to a shared pool, or when retrieval systems use org-level embeddings without user-scope filtering. The result is that User B's AI responses are subtly (or not so subtly) shaped by User A's private data.

Competing intents in shared history is subtler. When a team bot maintains a shared conversation thread — as most Slack bots do by default — the model reads all prior turns as a single coherent history. But different users in that thread have different goals, different domain knowledge, and different expectations. The model conflates them. A question from User A late in a thread will be interpreted through the lens of what User B said three turns earlier. The compounding effect means that shared-thread bots tend to degrade in usefulness as team adoption grows, and nobody can articulate exactly why.

Personalization bleeding across sessions is the longest-lived failure. Memory systems that save user preferences, learned behaviors, and conversation context are among the most valuable AI features — and among the most dangerous in multi-user environments. When memory is scoped too broadly, a user's preferences contaminate the org-level context. Worse, adversarial memory poisoning — injecting instructions into a shared memory store that persist across sessions — can shape every future user's experience. Microsoft Security documented exactly this attack pattern in 2026: instructions injected into an AI's memory survived session termination and redirected subsequent users' interactions.

Isolation Patterns That Work

The fundamental design principle is: context is a projection, not storage. Persistent state lives in stores keyed by userId. Each inference call assembles a context window by projecting from that user's substrate plus the current session. The context window itself is ephemeral and never written back to shared state. Separating sessionId (the current conversation) from memoryId (user identity) is the first step most teams skip.

Dual-tier memory architecture formalizes this. Private memory isolates sensitive, personal, and session-specific data per user. Shared memory enables controlled knowledge transfer — team conventions, codebase context, project history — with explicit access policies governing what can be retrieved and by whom. Research on collaborative memory systems formalizes this with dynamic access control: reads and writes to shared memory check a policy layer before proceeding, rather than operating freely on a global store.

Shared-context contracts define, explicitly, what is world-state (safe to share across all users: public docs, codebase, team norms), what is user-private (never shared: DMs, personal preferences, individual history), and what is role-scoped (shared within a permission group: project context for team members on a given project). Treat this like an API contract, not an implementation detail. Write it down. Review it when adding new context sources.

Per-user context assembly with retrieval-time filtering is the most operationally important pattern. Every retrieval call — vector search, document lookup, memory fetch — must include user identity as a filter, not just as a logging field. The retrieval system is the most common place where cross-user contamination enters the context window, because it's the step furthest from the user-facing authentication layer.

Sub-agent functional isolation applies to more complex systems. Planning agents carry task state. Retrieval agents handle lookup. Execution agents receive only the subtask context they require. No agent accumulates cross-user context by design. When agents are specialized and scoped, an accidental cross-user read in one agent doesn't propagate through the entire pipeline.

Race Conditions at Team Scale

Shared AI infrastructure introduces a class of concurrency bugs that most teams discover only in production.

The most common is a TOCTOU (time-of-check-to-time-of-use) race on token budgets: per-user or per-team quota is checked before the inference call, but by the time the call dispatches, another concurrent request from the same tenant has already consumed the available budget. The model call proceeds anyway. At low traffic volumes this is invisible; at team scale it turns into runaway cost and unfair resource allocation. The fix is synchronous enforcement — check and reserve the budget atomically before dispatching, not after.

Write conflicts in shared memory are more subtle. If two users' sessions both trigger memory writes at the same time, last-write-wins means one user's memory update silently overwrites the other's. For most memory systems, this means that under concurrent load, memory is less reliable precisely when the system is most active. Optimistic concurrency — version the memory records and reject writes that conflict — handles this without requiring a global lock.

Context assembly ordering is a third race that appears specifically in live-collaboration scenarios. When multiple users contribute to a shared session simultaneously (live doc editing with an AI assistant, pair programming with a bot), the order in which contributions are assembled into the context window determines the model's interpretation of the collaborative intent. Non-deterministic assembly order produces non-deterministic outputs even from a fully deterministic model. Explicit sequencing of context assembly — timestamped events merged in arrival order, not random order — prevents this.

Measuring Whether You Have This Problem

Most teams don't know if they have context leakage until it causes a visible incident. By then, a lot of sensitive data has likely already moved in the wrong direction.

The Microsoft Research PrivacyChecker system reduced information leakage from 33% to 8% on GPT-4o by adding contextual integrity checks — but you have to be measuring leakage rate to know there's a problem to fix. A simple audit: take 20 recent AI responses your system generated for User A. Would any of them have changed meaningfully if User B's data had been in the context? If you can't confidently answer "no," your context assembly has a scoping problem.

For teams using shared vector indexes: audit which user's data surfaces when you retrieve against queries that are generic enough to match multiple users' histories. A query like "summarize recent decisions" in an org-wide index will surface whatever happens to be nearest in embedding space, not what's appropriate for the requesting user. Per-user index partitioning or mandatory metadata filtering is the fix.

The Organizational Blind Spot

The reason most teams don't design for context ownership up front is that shared AI features are usually built by teams that think of themselves as building a productivity tool, not a multi-tenant system. Multi-tenancy requires upfront investment in access models, isolation primitives, and data governance — work that feels like plumbing when you're trying to ship a Slack bot.

But the cost of retrofitting context isolation into a shared AI feature after it's deployed is high. Memory stores have to be migrated and re-scoped. Retrieval indexes have to be re-partitioned. Conversation histories have to be audited for leaked data. The work compounds with the size of the deployment.

The mental model shift that prevents this: the moment more than one person's data can influence an AI response, you're building a multi-tenant system. The isolation requirements follow from that, not from whether the product team thought to label it as such.

Teams that build shared AI features as if they're building a multi-user API — with explicit ownership models, access policies, and data boundaries — ship safer systems and spend less time on incident response. The fact that the AI component is probabilistic doesn't change the requirements. It just makes the violations harder to detect when they do occur.

References:Let's stay in touch and Follow me for more thoughts and updates