Skip to main content

Multi-User Shared AI Sessions: The Concurrency Problem Nobody Has Solved

· 12 min read
Tian Pan
Software Engineer

Most AI products are built for a single user with a single intent, a single conversation thread, and a single identity. This works well enough when the product is a personal productivity tool—a writing assistant, a code completion engine, a summarizer. But something happens when teams start using AI collaboratively: the product silently breaks in ways that are hard to diagnose and harder to fix. Two users prompt the AI simultaneously, and one of their inputs disappears. A context window shared across five engineers fills up with duplicated history. The AI responds to user A's question using user B's permissions. Nobody designed for any of this, because shipping multi-user shared context means confronting one of the hardest distributed systems problems in modern AI infrastructure.

This post is about what actually makes simultaneous multi-user AI sessions hard, what production teams have tried, and what the emerging architectural patterns are. If you are building a collaborative AI feature and wondering why it feels impossibly complex, this is why.

Why Multi-User AI Is Architecturally Different, Not Just Harder

The single-user AI model has a clean architecture: one user, one session, one conversation history, one context window, one identity. The model processes the context and returns a response. The state machine is simple.

When you add a second user to the same session, you don't just double the load. You break the fundamental assumptions that the architecture was built on.

Context is holistic, not compositional. In a Google Doc, two users editing simultaneously produce changes that are local and bounded: user A edits paragraph 3, user B edits paragraph 7, and Operational Transform or a CRDT can merge the results deterministically. In a shared AI session, the context is everything: the system prompt, the accumulated conversation history, and the implicit mental model the AI has built about the task. When user A and user B send messages simultaneously, neither edit is local. Both messages potentially reshape the AI's understanding of the entire situation. There is no clean merge function for this.

The LLM is stateful in ways a spreadsheet cell is not. A spreadsheet cell contains a value. Two users writing to the same cell at the same time have a simple conflict that resolves to one writer winning. An LLM's inference is sequential and stateful: each token generated depends on all previous tokens in the context. If two users submit inputs at the same instant, you cannot run two inferences simultaneously over a shared context and then combine the outputs into something coherent. You must choose: serialize one user's input before the other, or fork the context and lose shared state.

Errors amplify across all users. In single-user mode, if the AI misreads a question and responds unhelpfully, the damage is contained to one user who can correct the misunderstanding in the next turn. In a shared session, if the AI misreads the team's intent, every participant is now working from the same bad output. The error does not stay local—it becomes organizational.

These are not engineering challenges that better tooling will eliminate. They are structural constraints that every multi-user AI system must make deliberate choices about.

What Real-Time Collaboration Primitives Actually Teach Us

Engineers reaching for multi-user AI naturally look to the established real-time collaboration playbook: Operational Transformation (OT), as used in Google Docs, or Conflict-free Replicated Data Types (CRDTs), as used in Figma and offline-capable editors like Notion. These tools are powerful and well-understood. But they solve a different problem.

OT works by treating every edit as an operation—insert character at position 5, delete characters 10–12—and transforming operations against each other when concurrent edits conflict. The key insight is that edit operations on documents are local and composable. User A's insert at position 5 can be mathematically adjusted when user B deletes characters 2–4, because positions are arithmetic.

CRDTs embed conflict resolution directly into the data structure. Any replica can diverge from any other and still converge to the same state through a merge function that is mathematically guaranteed to be commutative, associative, and idempotent. They're ideal for distributed, offline-capable systems where you cannot guarantee a central coordinator.

Neither maps cleanly onto shared AI context because AI context isn't text in the OT/CRDT sense. A system prompt is not a sequence of characters with local positions. Conversation history is a total-ordered log with semantic dependencies across entries. When user A's message in turn 7 references a commitment user B made in turn 4, there is no "position" to transform around. The meaning is entangled across the entire history in ways that defy algebraic decomposition.

This does not mean OT and CRDT techniques are useless for AI collaboration. They may be applicable to sub-problems: managing shared state for non-AI components of a collaborative UI, for example, or managing concurrent edits to the system prompt configuration. But they do not solve the core problem of simultaneous multi-user AI inference over a shared context.

The Three Decisions Every Shared Session Must Make

When you strip away the options, every multi-user AI architecture must answer three questions.

Who controls the AI at this moment? In single-user mode, the AI is always talking to one person. In multi-user mode, you need a turn-taking or priority model. The simplest choice is to queue messages and process them serially: user A's message is processed, a response is generated, then user B's message is processed. This preserves coherence but destroys the feel of simultaneity—users see each other's messages arriving in real time but see AI responses coming in a strict sequence, potentially in an unexpected order. The alternative is to let any user send at any time and generate responses to all inputs in parallel, but this requires forking the shared context into independent copies, destroying the shared state that makes a shared session valuable in the first place.

Whose identity governs what the AI can do? When user A (with admin permissions) and user B (with viewer permissions) are in the same session, a shared service account identity collapses both users into a single permission level—usually the union of all their permissions, which is the most permissive and most dangerous option. A proper per-user identity model requires the AI to operate under delegated, time-scoped credentials that reflect each user's actual entitlements. This is tractable but requires significant infrastructure: every tool call the AI makes must be attributed to a specific user identity, and the permissions must reflect that user's actual access level at that specific moment. Just-in-time credential issuance—where the AI requests a fresh credential for each action rather than holding a long-lived service account—is the right design, but most frameworks do not support it.

How does the context window get managed across users? A shared context fills faster than a private one. Five users contributing to the same session means roughly five times the message volume before the window reaches capacity. Once the window fills, something must be summarized or evicted. But whose history gets compressed? If user A's early context is summarized away, user A loses a thread that user B cannot reconstruct. Context compaction in shared sessions requires attributing history to contributors and making deliberate choices about what each participant needs to retain. Most implementations ignore this problem until users start noticing that the AI has forgotten something important.

What Production Teams Have Actually Shipped

A few commercial products have tackled multi-user AI sessions in production, and their designs reveal where the real constraints sit.

Microsoft Copilot Cowork, in experimental testing, allows multiple team members to engage with a shared Copilot agent in a collaborative workspace. The AI participates directly in team conversations, maintains shared context across participants, and contributes to shared documents. The "experimental" label is honest: the product represents Microsoft's acknowledgment that the problem is hard and their current solution is a starting point, not a finished answer. The architecture is primarily broadcast-based: the AI produces one response visible to all participants, rather than handling simultaneous multi-user writes to a shared context.

xAI Grok Build supports up to eight agents working simultaneously on a shared codebase, with conflict resolution for overlapping file edits. The cap at eight agents is not arbitrary—it reflects the coordination overhead thresholds above which the system's reliability degrades measurably. Multi-user editing works when the users' writes are mostly non-overlapping and the conflict surface is bounded (individual files), not when every user is contributing to the same unbounded context.

The pattern across production deployments is consistent: multi-user AI collaboration works best when users are primarily reading shared context and occasionally writing, not when multiple users are simultaneously generating conflicting writes to the same semantic state. Most teams solve the hard problem by avoiding it: the AI produces shared output that all users can read, but input from each user is processed serially rather than merged.

Durable Sessions: The Emerging Architecture

The most promising recent pattern reframes the problem. Instead of asking "how do multiple users share a single AI inference," it asks "how do multiple users share a single persistent session."

Durable Sessions treat the session itself—not the LLM's internal state—as the synchronization primitive. The session is an ordered, persistent event stream. The AI produces token-streamed outputs as events. Users submit messages as events. All participants subscribe to the same event stream and receive updates in order, with automatic catch-up for users who reconnect after a gap.

This sidesteps the serialization problem at the inference layer. The LLM processes one input at a time (serialized), but users experience a shared, live view of the session because all outputs are broadcast as an ordered event stream. A user who closes their laptop and reopens it receives the full catch-up from the event log. A user joining a session in progress receives the history. Presence indicators show who is currently connected to the session.

The analogy is a phone call that never drops: you can hand your phone to a colleague mid-call, the call continues, and the other party hears whoever is speaking at any given moment. The infrastructure maintains continuity; the participants determine who speaks when.

This architecture shifts the hard problems to the product layer rather than the infrastructure layer. Turn-taking, priority handling, and permission attribution still need to be implemented—but they can be implemented as session policies rather than as inference-layer constraints. The event stream becomes auditable by default, which addresses compliance and debugging requirements for shared sessions.

The Divergent Intent Problem Nobody Has Solved

Durable Sessions handle the concurrency mechanics reasonably well. They do not address the deeper challenge: what happens when multiple users in the same session want the AI to do fundamentally different things?

In a real-world collaborative session, this is not an edge case. A product manager and a senior engineer often have conflicting intent: the product manager wants the AI to brainstorm features, the engineer wants it to identify technical constraints. In a single-user session, the user's intent is unambiguous. In a multi-user session, the AI receives competing instructions and must somehow reconcile them.

Current systems handle this badly or not at all. Most rely on implicit turn-taking: whoever sent the most recent message sets the current direction. This means that whoever types fastest controls the AI's output, which is neither fair nor useful for collaborative work. More sophisticated approaches could implement explicit consensus mechanisms—requiring both users to confirm a direction before the AI proceeds—but this adds friction that undermines the value of real-time collaboration.

The divergent intent problem is not solvable by making the AI smarter about inferring consensus. It requires product-level decisions about how the session is structured: whether the session has a designated facilitator, whether there are distinct roles with distinct input channels, whether the AI mediates between conflicting inputs or simply surfaces the conflict to the participants. These are product design choices, not engineering choices, which is precisely why most AI products avoid them. Shipping a multi-user AI feature means making explicit choices about collaboration dynamics that single-user products never have to confront.

What to Build First (and What to Defer)

If you are building multi-user AI collaboration, the right sequence is roughly this.

Start with read-sharing, not write-sharing. Let multiple users observe the same AI session and see each other's messages and the AI's responses, but process each user's input serially, with clear visual indication of whose message the AI is currently responding to. This is far simpler than simultaneous write collaboration and still delivers substantial value.

Build per-user identity into the infrastructure before you need it. Retrofitting identity attribution onto a shared-service-account design is much harder than building it in from the start. Every tool call the AI makes should carry a user identifier. Every output should be attributable. This pays off immediately for debugging and audit requirements, and it makes it possible to implement proper permission scoping later.

Treat context window management as a first-class feature. Implement context attribution from day one: track which turns belong to which users, and build the compaction logic with that attribution in mind. Users who lose history that feels important to them will lose trust in the feature faster than almost any other failure mode.

Defer simultaneous write semantics until you have a clear product need that justifies the complexity. The divergent intent problem and the context-forking problem are genuinely hard, and solving them for a general-purpose session is likely not worth the engineering investment until you understand exactly what your users are trying to do when they collaborate.

The right question is not "how do we make our AI collaborative?" It is "what specific collaborative workflow do our users need, and what is the minimum infrastructure required to support it?" Multi-user AI sessions are not a feature to add. They are an architecture to design from the start—or a commitment to a significant retrofit later.

References:Let's stay in touch and Follow me for more thoughts and updates