Skip to main content

The Deadlock Your Agent Can't See: Circular Tool Dependencies in Generated Plans

· 11 min read
Tian Pan
Software Engineer

A planner agent emits seven steps. Each looks reasonable. The orchestrator dispatches them, the first three return values, the fourth waits on the fifth, the fifth waits on the seventh, and the seventh — buried three lines deep in the planner's prose — quietly waits on the fourth. Nothing is locked. No EDEADLK ever fires. The agent burns 40,000 tokens reasoning about why the fourth step "is taking longer than expected" and ultimately gives up with a soft, plausible apology to the user.

This is the deadlock your agent can't see. It is not the textbook deadlock from operating systems class — there are no mutexes, no resource graphs the kernel can introspect, no holders or waiters anyone in your stack would recognize. The dependencies live inside English sentences that the planner produced, the cycles form in latent semantics rather than in any data structure, and the failure mode looks indistinguishable from "the model is thinking hard." Classic deadlock detection is useless here, but the cost is identical: the workflow halts, tokens evaporate, and your trace tells you nothing.

The good news is that this category of failure is mechanically detectable — once you stop expecting the operating system to find it for you and start treating the plan itself as a graph that needs validation before any tool ever runs.

Why LLM-Authored Plans Produce Cycles

Plan-and-execute architectures, including LLMCompiler-style planners that stream a DAG of tasks with explicit dependencies fields, all share a quiet assumption: that the planner can correctly enumerate, in advance, what each step actually needs from the others. In practice, planners hallucinate dependencies the same way they hallucinate citations.

A few common patterns produce cycles:

  • Bidirectional fact dependency. "Geocode the address to a country, then look up the address format for that country, then validate the address." The planner notes that step 3 depends on step 2, and step 2 on step 1 — but the prompt to step 1 says "use the canonical format for that country," which makes step 1 depend on step 2 in flat contradiction with what the dependency field says. Each step is locally coherent. The graph is not.
  • Self-referential aggregation. "Summarize the findings from steps 4–9, then use that summary to choose which sub-questions to investigate in step 5." A human reader would catch it. The planner does not, because it generated the plan in linear order and never re-checked the back-edges.
  • Implicit shared state. Two steps both write to a "scratchpad" memory and both read from it. Step A's tool description says it reads the scratchpad first; step B's says it must run after A but writes prerequisite values to the scratchpad before A runs. The dependency field declares A → B, but the actual data flow is B → A → B.
  • Repair loops. A validator step is added "to retry if the previous step failed." When the previous step's failure mode is producing input the validator rejects, you have an unbounded cycle dressed as defensive engineering.

Recent research on tabular data generation makes this concrete: when LLMs are asked to extract feature dependencies and emit a graph for downstream topological traversal, the resulting graph routinely contains cycles. The example given is mundane — latitude and longitude determine country, but country constrains the valid ranges of latitude and longitude. Both directions are true, and the planner faithfully encoded both.

The pattern is the same in agent planning. Whenever the planner emits a graph, you should expect cycles, not be surprised by them.

Why Classic Deadlock Detection Doesn't Apply

If you've worked on databases or kernels, your reflex is to reach for cycle detection in a wait-for graph. That mental model breaks here for three reasons.

First, no resource is ever held. A real deadlock requires mutual exclusion plus hold-and-wait. The fourth tool call hasn't acquired anything; it's just sitting in the orchestrator's queue waiting for an upstream value that will never arrive. Your runtime sees a pending future, not a held lock.

Second, the dependency edges are not in the graph the runtime sees. The orchestrator's DAG says step 4 depends on step 5. It does not say step 7 also depends on step 4, because that edge exists only in the natural-language argument template inside step 7. Until step 7 actually runs and the LLM resolves its template, the runtime cannot see the back-edge. By the time the back-edge becomes visible, the agent has already been spinning for minutes.

Third, the symptom looks like slow progress, not a halt. A real deadlocked process is suspended; you can ps it and see it blocked on a futex. A "deadlocked" agent is busy. It is reasoning, calling its planner again, emitting tentative tool calls, generating thoughtful prose about how to proceed. The token meter spins. The trace fills with content. From a monitoring perspective, this is the worst possible failure shape: a high-cost, high-activity, zero-progress state that defeats every alarm tuned to "stuck" or "errored."

This is why the Multi-Agent System Failure Taxonomy lists "step repetition" as a distinct failure class. It is not an error. It is the absence of progress dressed in the costume of work.

A Static Plan-Graph Pass That Catches It Before the First Tool Call

The most effective defense is the cheapest one: extract the plan into a graph and check it for cycles before dispatching anything. The planner has already done the hard work of writing the steps down. You just have to read them more rigorously than the planner did.

A practical pass looks like this:

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates