The Parallelism Trap in Agentic Pipelines: When Fan-Out Makes Latency Worse
Your agent pipeline is slow, so you split the work across five parallel sub-agents. The p50 drops. You ship it. Three days later, an on-call page fires: a batch of user requests is timing out. You dig in and find that p99 has climbed from 4 seconds to 22 seconds. Nothing in the individual agents changed. The timeout was caused by the orchestration layer waiting for the slowest of the five, which ran into a retrieval hiccup that only happens 1% of the time — but now it happens to any request that touches all five paths.
This is the parallelism trap: a pattern that looks like an obvious speedup but restructures your latency distribution in ways that hurt real users more than the p50 improvement helps them. Across production benchmarks, single agents match or outperform multi-agent pipelines on 64% of evaluated tasks. When parallel fan-out wins, it wins cleanly — but only for a specific class of problems. The mistake is treating fan-out as the default.
