Structured Concurrency for AI Pipelines: Why asyncio.gather() Isn't Enough
When an LLM returns three tool calls in a single response, the obvious thing is to run them in parallel. You reach for asyncio.gather(), fan the calls out, collect the results, return them to the model. The code works in testing. It works in staging. Six weeks into production, you start noticing your application holding open HTTP connections it should have released. Token quota is draining faster than usage metrics suggest. Occasionally, a tool that sends an email fires twice.
The underlying issue is not the LLM or the tool — it's the concurrency primitive. asyncio.gather() was not designed for the failure modes that multi-step agent pipelines produce, and using it as the backbone of parallel tool execution creates problems that are invisible until they compound.
