Structured Concurrency for Parallel Tool Fanout: Who Owns Partial Failure?
The moment your agent fans out five parallel tool calls — search across three indexes, query two databases, hit one external API — you have crossed an invisible line. You are no longer writing prompt-and-response code. You are writing a concurrent program. Most agent frameworks pretend you are not, and the bill arrives at 2 AM.
The pretense is comfortable. The planner emits a list of tool calls, the runtime fires them off, the runtime collects whatever comes back, the planner consumes the aggregate. From a thousand feet up it looks like a fan-out / fan-in pipeline, and most teams treat it that way until production teaches them otherwise. The problem is that twenty years of concurrent-programming research — partial-failure semantics, structured cancellation, backpressure, deterministic error attribution — already solved the failure modes you are about to rediscover. Your agent framework, by default, did not import any of it.
The goroutine-soup default
Open your agent framework and find the function that runs parallel tool calls. With high probability it does some variant of asyncio.gather(*calls) or Promise.all(calls) or errgroup.Wait(), with light retry logic and a top-level timeout. Each child call runs detached in the sense that matters: there is no parent scope that owns its lifetime, no path by which one child's failure can interrupt another's work, no guarantee that all children have exited before the planner consumes the result.
This is the goroutine-soup model — a name borrowed from Nathaniel Smith's argument that the unstructured go statement, like the unstructured goto it descends from, is a control-flow primitive too primitive for the programs we actually write. It scales until it doesn't.
The failure modes it produces in agent systems are predictable once you list them. One call hangs forever and the others finish; the planner waits for the timeout to fire, which is set to the worst-case latency of the slowest expected call, which is much longer than the actual outage. Two calls return malformed responses and the planner replans, but the other three are still in flight, still running, still emitting side effects against systems that no longer matter to the new plan. The replan spawns a fresh fan-out before the old one drains, and now you have two cohorts of tool calls overlapping, racing each other to write the same downstream state. A vendor returns a 500, the retry layer fires, the original call's response arrives a second later, and the agent has both — without a way to know which to trust. None of these are exotic. They are the routine output of a tool layer that does not own the lifecycle of its children.
The cost dimension is worse. Industry tracking through 2026 puts runaway agent costs as the number-one operational pain point — an unbounded fanout that loops because two of three retries failed will spend more in an afternoon than a team budgeted for the quarter. The tool calls do not need to be expensive individually; they need to be unowned, so that nothing stops them when the work they were doing has become irrelevant.
What structured concurrency actually imports
Structured concurrency is the rule that every concurrent task lives inside a parent scope, and the scope does not exit until every child has terminated. Trio's nurseries, Python 3.11's asyncio.TaskGroup, Java's StructuredTaskScope (still in preview through JEP 505 as of late 2025), Kotlin's coroutine scopes — they are different ergonomics over the same invariant. If you opened the scope, you own everything in it. Nothing escapes. Nothing outlives you. Errors that happen inside the scope do not silently get logged and discarded; they propagate to the parent the way exceptions in single-threaded code do.
Three properties fall out of that invariant, and all three are properties your tool layer needs.
The first is deterministic cleanup. When any child fails, the scope cancels every other child and waits for them to finish their cleanup before propagating the exception. There is no question of which calls were in flight when the planner moved on; they all stopped, by the time you saw the error.
The second is exception aggregation. If two children fail simultaneously, you do not lose one of them in a race. The scope collects them — modern runtimes wrap them in an ExceptionGroup — so that the parent sees the full set of failures and can decide how to react. In the goroutine-soup model the second failure typically races into a silent log and disappears.
The third is cancellation that is honored, not requested politely. When the planner replans because two of five calls returned fatal errors, the other three need to actually stop. Most agent frameworks send a cancellation signal that the underlying HTTP client may or may not respect, and a tool that has already issued a side-effecting POST will cheerfully run to completion regardless. Structured concurrency demands that the cancellation primitive be honored at every layer below the scope, which means it forces a question your tool layer probably hasn't answered: which of my tools are interruptible, and at which boundary?
Partial failure has to be a policy, not an emergent behavior
The hardest part of a parallel fanout is not "what do I do when everything succeeds." It is "what do I do when two of five calls succeeded and three didn't." Most agent frameworks have no opinion on this question, which means they have an emergent answer that varies by which exception fired first and how the retry middleware was configured.
The policy you actually need is per-call, not per-fanout. Consider a planner that issues three searches across different indexes plus two metadata lookups. The two metadata calls are advisory — they enrich the response but the agent can produce a coherent answer without them. One of the three searches is the primary path; the others are corroboration. Your fanout has to know this. The runtime that gets all five calls and tells the planner "two of them failed" has handed the failure-recovery decision to the wrong layer.
A workable taxonomy distinguishes essential calls from advisory ones, and within essential, distinguishes quorum-based fanouts from first-success fanouts. An essential call's failure aborts the step. An advisory call's failure is logged and ignored. A quorum fanout returns success when N of M calls return; the remaining M-N are cancelled. A first-success fanout returns the first non-error response and cancels the rest. Java 25's StructuredTaskScope ships ShutdownOnSuccess and ShutdownOnFailure as named primitives precisely because these are the patterns that recur. Most agent frameworks expose neither; you write them inline against gather() and they end up subtly different at every call site.
The policy decision belongs at the planner-tool-layer interface, not in the tool implementations. The tool that issues a search does not know whether the planner thinks it's essential. The planner does know. So the fanout API has to let the planner declare the partial-failure policy when it issues the call, and the runtime has to honor that policy without further negotiation.
Cancellation discipline: the half-finished side effects nobody owns
Cancellation in a tool layer has to mean more than "the future is no longer awaited." It has to mean "the side effects this tool would have produced will not happen, or have been compensated for." Otherwise you have built a system in which the agent's plan can replan but the world the agent acts on cannot.
Consider the LangGraph issue that surfaced this concretely in 2025: when ToolNode ran tools in parallel using asyncio.gather, the first GraphInterrupt to propagate would cancel the sibling coroutines, but their interrupts — and any in-flight work they were doing — were lost. The framework had structured-concurrency cancellation semantics in the asyncio layer and goroutine-soup semantics at the agent layer, and the seam between them was where the failure lived. Multiple human-in-the-loop interrupts from parallel tools collapsed into one because the interrupt IDs were generated from the checkpoint namespace alone. This is not unique to LangGraph; it is what every agent framework that bolts parallel execution on top of a fundamentally sequential planner discovers eventually.
The cancellation discipline that works is at three layers. At the runtime layer, cancellation must propagate down through the call tree and wait for cleanup before the parent exits. At the tool layer, every tool that emits side effects has to declare its cancellation contract: is it interruptible mid-execution, is it idempotent so retries are safe, does it have a compensating action that undoes the partial effect? At the planner layer, replans have to wait for the prior fanout to drain before issuing a new one — otherwise the cancellation has no meaning, because two cohorts of tool calls are racing.
The third one is the one teams skip. They cancel the prior fanout and immediately issue the new one, and the cancellation token is a polite suggestion that the underlying HTTP client may or may not respect, so the old POST lands a second after the new POST and overwrites it. Cancellation is a contract, not an event.
The eval that constructs adversarial fanouts
Most agent evals score the happy path. The adversarial fanout eval is what catches the failure modes that this post is about, and it does not exist in most teams' suites because the failure modes look like flakiness rather than bugs.
The fixture you need is small but specific. Run a fanout in which one tool returns a malformed response, one hangs forever (or until a hard timeout), one returns a slow but valid response, one returns a fast valid response, and one returns a transient error that succeeds on retry. The test asks: does the agent reach a coherent state in bounded time? Does it correctly identify which calls succeeded, which failed, and which were cancelled? Does it produce a final answer or a clean error, and never an incoherent partial result?
A second fixture checks cancellation honestly. Issue a fanout, have the planner replan based on partial results, observe the still-in-flight calls. Assert that they did not emit their side effects after cancellation. Most teams discover at this point that they have no instrumentation that would prove this either way.
A third fixture checks cost. Run a fanout in a deliberately broken environment where every tool fails or hangs. Assert that the agent does not enter a retry-replan loop that consumes more than a budget the test sets ahead of time. The number of teams that have shipped agents without this test, then encountered the loop in production, is depressingly close to all of them.
Your tool layer is a concurrent-programming framework
The architectural realization that closes this loop is uncomfortable: an agent's tool layer is a concurrent-programming framework, whether the team that built it knows that or not. Every parallel fanout, every replan, every retry, every cancellation is a problem that has a body of literature behind it, and the alternative to importing that literature is rediscovering it incident by incident.
Three takeaways for teams looking at their own tool layer this week. Audit how parallel tool calls are run today; if the answer involves an unbounded gather() with no parent scope, you have goroutine soup, and the failure modes in this post are latent in your system. Make partial-failure policy a first-class argument to the fanout API; do not let it emerge from whichever exception fires first. Specify the cancellation contract for every side-effecting tool, and assert it in adversarial fanout evals; cancellation that is not tested is cancellation that does not work.
The agents that scale past prototype are the ones whose tool layers were designed by someone who has read the structured-concurrency literature. The agents that crater in production are the ones whose tool layers were designed by someone who thought "fan out five tools" was a one-line problem. The line is one line. The semantics are not.
- https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/
- https://en.wikipedia.org/wiki/Structured_concurrency
- https://trio.readthedocs.io/en/stable/reference-core.html
- https://openjdk.org/jeps/505
- https://google.github.io/adk-docs/agents/workflow-agents/parallel-agents/
- https://www.codeant.ai/blogs/parallel-tool-calling
- https://github.com/langchain-ai/langgraph/issues/6624
- https://changelog.langchain.com/announcements/langgraph-v0-4-working-with-interrupts
- https://www.gocodeo.com/post/error-recovery-and-fallback-strategies-in-ai-agent-development
- https://relayplane.com/blog/agent-runaway-costs-2026
- https://galileo.ai/blog/multi-agent-llm-systems-fail
- https://medium.com/@komalbaparmar007/llm-tool-calling-in-production-rate-limits-retries-and-the-infinite-loop-failure-mode-you-must-2a1e2a1e84c8
