Skip to main content

Routines and Handoffs: The Two Primitives Behind Every Reliable Multi-Agent System

· 8 min read
Tian Pan
Software Engineer

Most multi-agent systems fail not because the models are wrong, but because the plumbing is leaky. Agents drop context mid-task, hand off to the wrong specialist, or loop indefinitely when they don't know how to exit. The underlying cause is almost always the same: the system was designed around what each agent can do, without clearly defining how work moves between them.

Two primitives fix most of this: routines and handoffs. They're deceptively simple, but getting them right is the difference between a demo that works and a system you can ship.

What a Routine Actually Is

A routine is not a workflow template. It's not a directed acyclic graph of prompts. It's something more minimal: a natural language instruction set paired with the tools needed to execute it.

Think of it as an agent's operating procedure. Given a customer complaint, a support routine might say:

Check the order status. If the order is delayed, apologize and offer a $10 credit. If the item was never shipped, escalate to fulfillment. If the customer is asking about a return, transfer to the returns agent.

The agent follows these instructions using tool calls — get_order_status, apply_credit, escalate_to_fulfillment — until the task is complete or it hits a handoff condition.

The key insight is that LLMs are remarkably good at following sequential natural language procedures, especially when each step has a corresponding tool. The instruction doesn't need to be a rigid state machine. The model handles ambiguity and fills in gaps, which is exactly what you want for the messy real-world cases your workflow didn't anticipate.

What makes a good routine:

  • Steps are actions, not descriptions ("Check the order status" vs. "The agent should be aware of order status")
  • Each conditional branch ends in a clear outcome: resolution, escalation, or handoff
  • Tools map one-to-one with discrete actions the agent can take
  • The routine is scoped — it handles one domain, not everything

A routine that tries to handle billing, returns, technical support, and account changes will fragment into incoherence. Keep routines narrow and composable.

Handoffs: Explicit Is Better Than Implicit

A handoff happens when one agent transfers an active conversation — along with its full context — to another agent better suited to continue.

The naive implementation is to route based on intent classification up front: detect the user's intent, then dispatch to the right agent. This breaks quickly. Intents change mid-conversation. Users start a return inquiry and discover they have a billing question. A triage agent can't always know which specialist is needed until the conversation has progressed.

The better model is to let agents initiate handoffs themselves. Each agent gets a set of transfer_to_X tools — one per agent it can hand off to. When the model determines it's hit a boundary, it calls the tool. The runtime swaps the active agent and continues the loop with the new agent's instructions.

def transfer_to_billing_agent():
"""Transfer the conversation to the billing specialist."""
return billing_agent

def transfer_to_returns_agent():
"""Transfer the conversation to the returns and refunds specialist."""
return returns_agent

When a tool returns an agent object rather than a string, the runtime interprets that as a handoff and updates the active context. The conversation history travels with it — the receiving agent knows everything that happened before it took over.

This is powerful for a few reasons:

  • The model decides when to hand off, not a rule engine. It reads the conversation and makes a judgment call.
  • Handoffs can be chained. A triage agent can hand off to a specialist, who can hand off to an escalation agent if needed.
  • The user never needs to repeat themselves. Context is preserved across the entire chain.

The Execution Loop

The runtime that ties routines and handoffs together is simple enough to fit in a screen:

  1. Call the current agent's model with its instructions and tools
  2. Parse any tool calls from the response
  3. Execute the functions — if a function returns an agent, swap to that agent
  4. Append results to the conversation history
  5. Repeat until the model returns a plain text response with no tool calls

That's it. The loop terminates when the agent has nothing left to do. Every agent in the system is just an instruction set and a list of tools — the same schema, different contents.

This uniformity matters for maintainability. When you need to add a new specialist, you don't touch the routing logic. You write a new routine, add a transfer_to_new_specialist function to whatever agents should be able to reach it, and you're done.

Where Handoff Graphs Break Down

Simple chains work well. Deep graphs with many possible paths get expensive to debug.

Context loss at handoff boundaries is the most common failure mode. Even when conversation history is preserved, each new agent starts with its own instructions. If those instructions assume specific context that isn't in the history — an internal case ID, a confirmed identity check, a cached lookup — the receiving agent may make decisions on incomplete information.

The fix is to be explicit in your handoff tools. Don't just transfer; transfer with a summary:

def transfer_to_escalation_agent(reason: str, case_summary: str):
"""
Transfer to escalation when the issue cannot be resolved at tier 1.

Args:
reason: Why escalation is needed
case_summary: Brief summary of the issue and what was attempted
"""
escalation_agent.context = {"reason": reason, "summary": case_summary}
return escalation_agent

Forcing the transferring agent to articulate why it's handing off and what it tried dramatically improves continuity on the receiving end.

Coordination latency compounds. Each handoff adds 100–500ms of overhead before the next agent even starts processing. A workflow with ten hops adds up to five seconds of pure routing overhead. Design your agent topology to minimize unnecessary hops. If two agents frequently hand off to each other, they might need to be merged into a single broader routine.

Debugging implicit handoffs is painful. When a user ends up in the wrong agent, you need to reconstruct the decision path. Build tracing in from the start: log which agent was active, what tool was called, and what the model's reasoning was at each handoff point. Without this, production incidents become archaeological digs.

Designing Agent Topologies

There are three common shapes for multi-agent systems, and each fits different use cases.

Linear chains work for sequential pipelines where each step has a clear output that feeds the next. Intake → classify → enrich → respond. Low coordination overhead, easy to trace.

Hub-and-spoke puts a triage or orchestrator agent at the center, routing to specialists and collecting results. This is the right shape for customer-facing systems where you don't know up front which domain a user will land in.

Peer networks let any agent hand off to any other. These are flexible but hard to reason about at scale. Reserve peer networks for small, well-understood agent sets — three to five agents where all the interaction patterns are known.

A practical rule: start with hub-and-spoke. The triage agent becomes a natural seam for adding observability, rate limiting, and guardrails. Flatten to linear chains where you have deterministic workflows. Avoid peer networks until you've exhausted simpler options.

What Good Looks Like in Production

A well-designed routine-and-handoff system has a few properties that distinguish it from a fragile demo:

Agents are narrow specialists, not generalists. Each agent does one job well. A billing agent doesn't also handle returns. The complexity of any single agent stays bounded.

Handoffs are declared, not discovered. Every agent has an explicit list of agents it can transfer to. Routing is never ad hoc. This makes the system auditable — you can draw the handoff graph and verify it matches your intent.

Context is explicit at boundaries. Handoff functions carry structured summaries, not just raw history. The receiving agent gets what it needs to act immediately, not a transcript to parse.

The loop terminates. Every routine has clear exit conditions: resolution, escalation, or handoff. An agent that can't complete its task and can't hand off anywhere should fail with a clear message, not loop indefinitely.

The underlying model — one instruction set, one tool list, one execution loop — scales from a two-agent system to dozens of specialists. The primitive stays constant; you're just adding more nodes.

Agent orchestration is ultimately a software design problem dressed in probabilistic clothing. The models handle the ambiguity. Your job is to make sure the structure around them is as explicit and debuggable as any other distributed system you'd run in production.

Let's stay in touch and Follow me for more thoughts and updates