Skip to main content

When Two Agents Share a Tool: Concurrency Bugs in Multi-Agent Systems

· 9 min read
Tian Pan
Software Engineer

The moment you typed "spin up another agent to handle that in parallel," you became a distributed systems engineer. You probably didn't notice. The framework made it a one-line change, the demo worked, and the latency dropped. But under the hood you just introduced two processes that read and write shared state with no coordination — and every race condition, lost update, and dirty read that has haunted databases for fifty years is now sitting in your agent stack, waiting.

The reason this bites so hard is that the failure doesn't look like a concurrency bug. It looks like one agent being wrong. The output is syntactically valid, the pipeline is green, no exception is thrown — and yet a customer got charged twice, or a file is missing half its expected content, or an agent confidently acted on a number that another agent had already overwritten. You go debug "the dumb agent" and find nothing wrong with its prompt, because the prompt was never the problem.

Multi-agent systems are sold on the premise that agents are independent. They run their own reasoning loop, they have their own context window, they make their own decisions. That independence is exactly the illusion. The instant two agents touch the same filesystem, the same database row, the same scratchpad, or the same API token, they are no longer independent — they are concurrent processes contending for a shared resource, and nobody designed the contention out.

The Independence Illusion

Walk through what "shared" actually means in a typical agent deployment. Two research agents both write findings into a shared memory store. A planner and an executor both update a task object that tracks progress. Three coding agents all run in the same repository checkout. A fleet of worker agents all authenticate with the same API key that carries a single rate limit. Each of those is a contention point, and none of them announced itself as one.

The designer's mental model is a clean org chart: a manager agent delegates, sub-agents go off and do isolated work, results come back and get merged. The runtime reality is closer to four interns editing the same Google Doc with change-tracking turned off. The org chart says they're independent. The document says otherwise.

What makes this worse than classic distributed systems is timing variance. A traditional service has somewhat predictable latency. An LLM agent does not — one call returns in 200 milliseconds, the next takes nine seconds because the model decided to think harder, or a tool timed out and retried. The interleavings you get in production are wildly more varied than anything your local test run produced. A race that has a one-in-ten-thousand chance per request becomes a daily incident once you're doing millions of requests across unpredictable agent durations.

The Lost Update, Reincarnated

The single most common corruption in multi-agent systems is the lost update, and it is worth being precise about the mechanism because the fix depends on it.

Agent A reads a shared state object — say a task record showing three of five subtasks complete. Agent B reads the same record at nearly the same time, also seeing three of five. Agent A finishes its subtask, sets the count to four, and writes. Agent B finishes its subtask, sets the count to four (it never saw A's write), and writes. The final state says four of five complete. One subtask's completion has silently evaporated. No error. No log line. The record is perfectly well-formed; it's just wrong.

This is the textbook read-modify-write race, and the textbook is sixty years old. The reason it keeps reappearing is that agent frameworks expose state as a plain mutable object — a dictionary, a JSON blob, a row you SELECT and then UPDATE. The read and the write are two separate operations with a gap in between, and any other agent can slip into that gap. The framework gives you the convenience of shared memory without the obligation of synchronizing access to it.

The same pattern produces the dirty read — Agent B reads state that Agent A is halfway through mutating, acts on a value that will never be final, and produces a confidently wrong answer downstream. And it produces the double-execution: a worker agent's tool call succeeds, the acknowledgment is lost, the orchestrator retries, and a non-idempotent operation runs twice. If that operation was "charge the customer" or "send the email" or "create the ticket," you now have a duplicate in the real world that no rollback can fully undo.

Four Shared Resources That Look Innocent

It helps to name the specific resources that quietly become contention points, because each one has a different fix.

  • The shared filesystem. Coding agents are the obvious case. Two agents editing files in one checkout will overwrite each other's work or fight over git's index lock. The mature fix is git worktrees: each agent gets its own working directory and index while sharing the object store, so file-level collisions become merge-time conflicts that standard tooling can detect — instead of silent overwrites during active work. But worktrees only isolate what git tracks. If both agents start a dev server on port 3000 or write to the same scratch directory, the host still has one of each, and you're back to contention.

  • The shared database row or state object. This is the lost-update machine described above. The fix is optimistic concurrency: attach a version number to the record, and make every write a compare-and-swap — UPDATE ... SET value=?, version=version+1 WHERE id=? AND version=?. If another agent wrote in between, the WHERE matches zero rows, the update fails loudly, and the agent retries against fresh state instead of clobbering it.

  • The shared API token with one rate limit. Ten agents, one key, one quota. Agent independence is a lie the moment they all draw from the same bucket — one agent's burst starves the other nine, and the failure shows up as unexplained timeouts in agents that did nothing wrong. The rate limit is shared state too, even though it never appears in your code as a variable.

  • The shared scratchpad or memory store. Agents that "collaborate" through a common memory blob are doing concurrent writes to a data structure with no schema and no locking. Append-only logs survive this; in-place updates to a shared blob do not.

Make the Tool Layer Transactional

The instinct, when you finally see the concurrency, is to push the fix into the agents — write a better prompt telling the agent to "check if another agent is working on this first." Do not do this. An LLM cannot reason its way out of a race condition, because the race is in the milliseconds between its read and its write, not in its reasoning. Coordination has to live in the tool layer, below the model, where it can be enforced atomically.

That means the tools agents call should behave like a small database, not like a pile of convenience functions:

  • Idempotency keys on every mutating tool. Every "charge," "send," "create," or "write" tool should accept an idempotency key derived from the intent of the action. The tool layer records which keys it has already executed and short-circuits duplicates. A retry then becomes a safe no-op instead of a second charge. This is the cheapest insurance in the entire system and the most commonly skipped.

  • Compare-and-swap for state updates. No blind writes. Every update carries the version the agent read, and the tool rejects the write if the version moved. The agent handles the rejection by re-reading and retrying — bounded, with backoff.

  • Agent-scoped resources by default, shared only when justified. Give each agent its own worktree, its own scratch namespace, its own token where the budget allows. Make sharing an explicit, deliberate decision rather than the silent default. Most "multi-agent concurrency bugs" are really "we shared something that didn't need to be shared."

  • Compensation handlers for partial failure. When a multi-step agent action fails halfway, you need a defined way to undo or reconcile the steps that did land — the saga pattern, applied to agent workflows. Production multi-agent planning systems converging on this design treat reliability as a systems property: versioned logs, idempotency keys, and explicit retry and timeout policy, not prompt instructions.

Detecting Corruption You Can't See

The hardest part is that these bugs are invisible by construction. The output is well-formed. So you have to instrument for them deliberately.

Log every read and write to shared state with the agent ID, a timestamp, and the version observed. When a lost update happens, the trace shows two agents reading the same version and both writing — that pattern is the signature, and you can alert on it. Add invariant checks that run after agent workflows: if subtask completions should sum to the total, assert it; if a balance should never go negative, assert it. An invariant violation is a concurrency bug caught in the act, even when no exception fired.

And accept that you cannot reproduce these reliably in a normal test. They are timing-dependent and only surface under load or in specific interleavings. Stress testing with artificial agents that deliberately race each other, plus fault injection that delays one agent's write to widen the race window, will surface in an afternoon what production would otherwise surface at 2 a.m. a month from now.

The Takeaway

"Spin up another agent" is not a scaling primitive. It is a concurrency decision, and it inherits every hard-won lesson from databases and distributed systems. The frameworks make parallelism a one-liner; they do not make correctness a one-liner. That gap is yours to close.

Before you add the second agent, ask one question: what do these two agents share? A filesystem, a row, a token, a memory blob — find it, and either isolate it so it isn't shared, or put a transaction around it so the sharing is safe. Treat your tool layer as the database it secretly is. The alternative is shipping a system that is almost always right, fails silently when it isn't, and sends you to debug an agent that was never the problem.

References:Let's stay in touch and Follow me for more thoughts and updates