Sequential Tool Call Waterfalls: The Hidden Latency Tax in Agent Loops
If you've profiled an AI agent that felt inexplicably slow, chances are you found a waterfall. The agent called tool A, waited, then called tool B, waited, then called tool C — even though B and C had no dependency on A's result. You just paid 3× the latency for 1× the work.
This pattern is not an edge case. It's the default behavior of virtually every agent framework. The model returns multiple tool calls in a single response, and the execution loop runs them one at a time, in order. Fixing it isn't complicated, but first you need a reliable way to identify which calls are actually independent.
The problem is structurally identical to the N+1 query problem in ORMs. In traditional backends, an inexperienced developer fetches a list of 100 orders, then makes 100 individual database queries to get each order's customer — when a single JOIN would have done the job. The code looks correct; the performance is a disaster. In agent loops, the equivalent is an LLM that generates 5 independent search() calls to answer a comparison question, and your orchestrator dutifully executes them in a chain instead of a fan-out.
Why Agent Frameworks Default to Sequential Execution
The mechanics are straightforward. When a model returns a tool use response containing multiple tool calls, the simplest correct implementation is a for loop: iterate over the list, execute each one, collect results, continue. This is what most frameworks ship by default, and it works — in the sense that it produces correct results.
The problem is that "correct" and "fast" are not the same thing when tool calls are I/O-bound. Each tool call that hits an external API, a database, or a vector store spends most of its time waiting on the network. A sequential loop serializes that wait time:
- Tool A: 300ms of network I/O
- Tool B: 200ms of network I/O
- Tool C: 400ms of network I/O
- Sequential total: 900ms
- Parallel total: 400ms (time of the slowest call)
The real-world impact compounds in multi-turn agent loops where the model makes tool calls across several reasoning steps. Production traces from agents doing research-style tasks routinely show 60–70% of total latency attributable to serialized I/O that could have been parallelized.
Anthropic's internal research on multi-agent systems found up to 90% reductions in task completion time on complex research queries when parallel subagents with concurrent tool execution replaced a single sequential agent. The LLMCompiler architecture, a well-benchmarked academic approach to this problem, shows 1.4x to 3.7x latency improvements across standard benchmarks, and more notably, a 4.65x reduction in cost because you make fewer LLM round-trips when you batch more work into each one.
The Dependency Graph: Your Diagnostic Tool
Before you can parallelize tool calls, you need to know which ones are safe to run concurrently. The answer comes from a simple question about each pair of tool calls in a batch: does the input to call B depend on the output of call A?
If yes, they must run sequentially. If no, they can run in parallel. Represent this as a directed acyclic graph (DAG), where each node is a tool call and each edge represents a dependency. Nodes with no incoming edges (no dependencies) can all execute at the same time. After that batch completes, you execute the next layer of the graph — nodes whose dependencies are now satisfied — and so on.
In practice, constructing this graph doesn't require heavy machinery. Most agentic tasks fall into one of three shapes:
Pure fan-out: Multiple independent calls with no shared dependencies. "Compare weather in five cities" generates five get_weather calls, none of which depend on each other. Run them all at once.
Linear chain: Each call depends on the previous result. "Get the user's account ID, then fetch their order history, then recommend products" is inherently sequential — you can't look up orders without an account ID. No parallelism is available here, and that's correct.
Mixed DAG: A combination of both. "Get company data, then in parallel fetch the CEO profile, revenue figures, and patent filings, then combine for a competitive analysis." The first call is sequential because everything else depends on it. The next three are parallelizable. The final synthesis step is sequential because it depends on all three.
The most important diagnostic habit is learning to distinguish the mixed DAG case from the false linear chain. Agents frequently generate what looks like a linear chain but is actually a mixed DAG in disguise — the LLM just happened to list the calls in dependency order without signaling which ones could overlap.
Identifying Dependencies in Practice
For simple tool calls, dependency analysis reduces to inspecting parameter values. If tool call B uses a literal parameter that was hardcoded in the prompt or context, it's independent. If tool call B uses $result_of_A as a parameter — a placeholder referencing a prior call's output — it's dependent.
The LLMCompiler architecture formalizes this with a three-phase design:
- https://www.codeant.ai/blogs/parallel-tool-calling
- https://google.github.io/adk-docs/agents/workflow-agents/parallel-agents/
- https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/ai-agent-design-patterns
- https://www.anthropic.com/engineering/multi-agent-research-system
- https://arxiv.org/pdf/2312.04511
- https://agent-patterns.readthedocs.io/en/stable/patterns/llm-compiler.html
- https://airbyte.com/agentic-data/parallel-tool-calls-llm
- https://continue.ghost.io/parallel-tool-calling/
- https://www.anthropic.com/engineering/advanced-tool-use
- https://cookbook.openai.com/examples/agents_sdk/parallel_agents
- https://towardsdatascience.com/why-your-multi-agent-system-is-failing-escaping-the-17x-error-trap-of-the-bag-of-agents/
