Structured Concurrency for Parallel Tool Fanout: Who Owns Partial Failure?
The moment your agent fans out five parallel tool calls — search across three indexes, query two databases, hit one external API — you have crossed an invisible line. You are no longer writing prompt-and-response code. You are writing a concurrent program. Most agent frameworks pretend you are not, and the bill arrives at 2 AM.
The pretense is comfortable. The planner emits a list of tool calls, the runtime fires them off, the runtime collects whatever comes back, the planner consumes the aggregate. From a thousand feet up it looks like a fan-out / fan-in pipeline, and most teams treat it that way until production teaches them otherwise. The problem is that twenty years of concurrent-programming research — partial-failure semantics, structured cancellation, backpressure, deterministic error attribution — already solved the failure modes you are about to rediscover. Your agent framework, by default, did not import any of it.
The goroutine-soup default
Open your agent framework and find the function that runs parallel tool calls. With high probability it does some variant of asyncio.gather(*calls) or Promise.all(calls) or errgroup.Wait(), with light retry logic and a top-level timeout. Each child call runs detached in the sense that matters: there is no parent scope that owns its lifetime, no path by which one child's failure can interrupt another's work, no guarantee that all children have exited before the planner consumes the result.
This is the goroutine-soup model — a name borrowed from Nathaniel Smith's argument that the unstructured go statement, like the unstructured goto it descends from, is a control-flow primitive too primitive for the programs we actually write. It scales until it doesn't.
The failure modes it produces in agent systems are predictable once you list them. One call hangs forever and the others finish; the planner waits for the timeout to fire, which is set to the worst-case latency of the slowest expected call, which is much longer than the actual outage. Two calls return malformed responses and the planner replans, but the other three are still in flight, still running, still emitting side effects against systems that no longer matter to the new plan. The replan spawns a fresh fan-out before the old one drains, and now you have two cohorts of tool calls overlapping, racing each other to write the same downstream state. A vendor returns a 500, the retry layer fires, the original call's response arrives a second later, and the agent has both — without a way to know which to trust. None of these are exotic. They are the routine output of a tool layer that does not own the lifecycle of its children.
The cost dimension is worse. Industry tracking through 2026 puts runaway agent costs as the number-one operational pain point — an unbounded fanout that loops because two of three retries failed will spend more in an afternoon than a team budgeted for the quarter. The tool calls do not need to be expensive individually; they need to be unowned, so that nothing stops them when the work they were doing has become irrelevant.
What structured concurrency actually imports
Structured concurrency is the rule that every concurrent task lives inside a parent scope, and the scope does not exit until every child has terminated. Trio's nurseries, Python 3.11's asyncio.TaskGroup, Java's StructuredTaskScope (still in preview through JEP 505 as of late 2025), Kotlin's coroutine scopes — they are different ergonomics over the same invariant. If you opened the scope, you own everything in it. Nothing escapes. Nothing outlives you. Errors that happen inside the scope do not silently get logged and discarded; they propagate to the parent the way exceptions in single-threaded code do.
Three properties fall out of that invariant, and all three are properties your tool layer needs.
The first is deterministic cleanup. When any child fails, the scope cancels every other child and waits for them to finish their cleanup before propagating the exception. There is no question of which calls were in flight when the planner moved on; they all stopped, by the time you saw the error.
The second is exception aggregation. If two children fail simultaneously, you do not lose one of them in a race. The scope collects them — modern runtimes wrap them in an ExceptionGroup — so that the parent sees the full set of failures and can decide how to react. In the goroutine-soup model the second failure typically races into a silent log and disappears.
The third is cancellation that is honored, not requested politely. When the planner replans because two of five calls returned fatal errors, the other three need to actually stop. Most agent frameworks send a cancellation signal that the underlying HTTP client may or may not respect, and a tool that has already issued a side-effecting POST will cheerfully run to completion regardless. Structured concurrency demands that the cancellation primitive be honored at every layer below the scope, which means it forces a question your tool layer probably hasn't answered: which of my tools are interruptible, and at which boundary?
Partial failure has to be a policy, not an emergent behavior
The hardest part of a parallel fanout is not "what do I do when everything succeeds." It is "what do I do when two of five calls succeeded and three didn't." Most agent frameworks have no opinion on this question, which means they have an emergent answer that varies by which exception fired first and how the retry middleware was configured.
- https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/
- https://en.wikipedia.org/wiki/Structured_concurrency
- https://trio.readthedocs.io/en/stable/reference-core.html
- https://openjdk.org/jeps/505
- https://google.github.io/adk-docs/agents/workflow-agents/parallel-agents/
- https://www.codeant.ai/blogs/parallel-tool-calling
- https://github.com/langchain-ai/langgraph/issues/6624
- https://changelog.langchain.com/announcements/langgraph-v0-4-working-with-interrupts
- https://www.gocodeo.com/post/error-recovery-and-fallback-strategies-in-ai-agent-development
- https://relayplane.com/blog/agent-runaway-costs-2026
- https://galileo.ai/blog/multi-agent-llm-systems-fail
- https://medium.com/@komalbaparmar007/llm-tool-calling-in-production-rate-limits-retries-and-the-infinite-loop-failure-mode-you-must-2a1e2a1e84c8
