Skip to main content

The Distributed Trace That Goes Dark at the Agent Handoff

· 11 min read
Tian Pan
Software Engineer

You open the trace for a failed run. The span tree is beautiful: the user request, the planner agent's reasoning, three tool calls, token counts, latencies, all of it nested cleanly. Then the planner hands off to a specialist agent — and the trace ends. Not with an error span. It just stops. The next thing you have is a separate, rootless trace from the specialist agent that begins mid-thought, with no parent, no inputs you can see, and no connection to the request that caused it.

The bug lives in that gap. It always does. The handoff is where one agent's assumptions meet another agent's interpretation, and it is the single place your trace cannot follow.

This is not a logging problem. Your agents are probably emitting spans correctly on both sides. The problem is that the trace context — the thread ID that stitches spans into one story — did not survive the jump from caller to callee. Every HTTP client and gRPC stub in your stack propagates that context for free. Your agent handoff does not, because nobody told it to.

Why agent handoffs break what HTTP never does

Distributed tracing works because of a quiet contract. When a service makes an outbound call, it injects the current trace context into the request — for HTTP, that is the W3C traceparent header carrying the trace ID, the parent span ID, and sampling flags. When the receiving service handles the request, it extracts that context and makes its own spans children of the parent. Extract, inject, repeat. The trace ID stays constant across every hop, and the span tree assembles itself.

You never wrote that code. OpenTelemetry's auto-instrumentation libraries wrap the standard HTTP and gRPC clients, so the traceparent header rides along on every request without you thinking about it. This is why a request through six microservices produces one coherent trace: the boundary between services is a well-known, instrumented chokepoint.

An agent handoff is also a boundary between two units of execution. But it does not look like an HTTP call. It looks like one of these:

  • A function call into a sub-agent's run() method inside the same process.
  • A message dropped onto a queue for a worker agent to pick up later.
  • A new task spawned on a thread pool or async event loop.
  • A payload posted to a separate agent service that does use HTTP — but carries the handoff in a JSON body, not in headers.

None of those paths is the instrumented HTTP client. The auto-instrumentation has nothing to hook. Trace context in OpenTelemetry lives in context-local storage — thread-local, or the async-context equivalent — and the moment execution crosses into a new thread, a new task, or a new process via a channel nobody instrumented, that storage is empty. The receiving agent starts a span, finds no parent in scope, and silently becomes a new trace root.

That is the whole bug. The handoff is treated as a payload — here is the task, here is the context the next agent needs to do its job — when it is also a carrier. It must carry the task and the trace context, the same way an HTTP request carries a body and a traceparent header. Teams instrument the payload obsessively and forget the carrier completely.

The orphaned trace is the most expensive gap in an incident

Walk through a real incident. A user reports the agent deleted the wrong record. You pull the trace. The planner agent looks fine — it correctly decided to route a deletion request to the database specialist. The trace ends at the handoff.

Now you go looking for the specialist's trace. You have a rough timestamp, so you scan the specialist service's traces in that window and find... forty of them, because the system was busy. Which one corresponds to this user? You cannot tell, because the thing that would tell you — the shared trace ID — is exactly what got dropped. You start correlating on timestamps and record IDs, reconstructing by hand the parent-child link the tracing system exists to give you automatically.

This is the worst possible time to be doing that work. You are mid-incident, the data is moving, and the handoff payload — the precise inputs the specialist received — may not be logged anywhere at all, because everyone assumed the trace captured it. The single most useful question in a multi-agent postmortem is "what exactly did the downstream agent receive?" An orphaned trace cannot answer it.

The cost compounds. Sub-agent coordination failures — where agent A sends incomplete or subtly wrong context to agent B — are among the most common multi-agent bugs, and they are invisible from B's side. The team debugging the specialist agent sees a plausible input and a wrong output and concludes the specialist is broken. They tune the specialist's prompt. The actual root cause was upstream: the planner summarized the request and dropped a qualifier. Without a trace that crosses the handoff, you cannot see that the input was already poisoned. You debug the wrong agent, ship a fix that does nothing, and the bug returns.

A whole trace turns that hour of correlation into one click: expand the handoff span, read the payload the specialist received, compare it to what the user asked. The diagnosis is right there. A broken trace turns it into archaeology.

Make the handoff a carrier, not just a payload

The fix is the same extract-and-inject discipline that HTTP gets for free — applied by hand at the boundary your framework does not instrument. The pattern does not change with the transport; only the slot you put the context in changes.

In-process handoff (function call into a sub-agent). This is the easiest case and the one teams most often get wrong, because it does not feel like a boundary. If the sub-agent runs on the same thread within the same async context, OpenTelemetry's context propagates naturally — provided you wrap the sub-agent's execution in a child span and do not start a fresh root. The failure mode here is a sub-agent that calls tracer.start_as_current_span after the parent context has been cleared, or a framework that deliberately resets context between agents. Make the sub-agent invocation an explicit child span named for the agent — the GenAI semantic conventions define invoke_agent {agent.name} for exactly this — and the tree stays intact.

Cross-thread or async handoff (thread pool, task spawn). Context-local storage does not follow you across a thread or a task boundary. You must capture the current context before the jump and re-attach it inside the new execution unit. Most OpenTelemetry SDKs provide a helper for this — wrapping the executor, or binding the context to the spawned coroutine. The rule: if you ever write executor.submit or spawn a detached task around agent work, you have crossed a boundary and you owe it an explicit context capture.

Queue or message-bus handoff (worker agent picks up later). Here the boundary is fully asynchronous and there is no shared memory at all. Inject the trace context into the message itself — a traceparent field in the message metadata — exactly as HTTP puts it in a header. The worker extracts it on dequeue. One subtlety: a parent-child span relationship implies the parent is waiting on the child, which is false for fire-and-forget queue work. For decoupled async work, use a span link instead of a parent-child edge. A link says "this run was caused by that run" without claiming the parent blocked on it — the right semantics for an event-driven handoff, and supported precisely because parent-child is the wrong model when nothing is waiting.

Cross-service handoff (separate agent over HTTP). If the handoff already travels over HTTP, the irony is that you may have disabled the free propagation by building the request by hand. The agent payload goes in the JSON body, the request is constructed with a raw client, and nobody added the traceparent header. Either route the call through the instrumented HTTP client, or inject the context into the headers yourself. The transport supports it; your code just skipped it.

Across all four, the discipline is identical: at every handoff, ask what carries the trace context, and put it there. Body or header, header or message field, thread-local or explicit capture — the slot changes, the obligation does not.

Instrument the sub-calls, not just the agents

Stitching the handoff is necessary but not sufficient. A trace that crosses agent boundaries but flattens everything inside an agent into one span is only half an instrument. The expensive questions in an agent incident are usually one level deeper: which tool call failed, what did the model see before it chose that tool, how many retry loops did it spin through.

Model each layer as its own span, nested under the agent that owns it. The GenAI semantic conventions give you the vocabulary — invoke_agent for an agent run, execute_tool for a tool call, chat for an LLM request — and each carries standard attributes for model identity, token counts, and tool names. A well-formed agentic trace reads top to bottom as: user request → planner agent → handoff span → specialist agent → its tool calls → each tool's own LLM sub-calls → final response. Every node has inputs, outputs, latency, and cost. The handoff is one span among many in that tree, not a cliff the tree falls off.

When a tool call itself triggers another agent — increasingly common as tools wrap whole sub-agents — that nested invocation needs the same context-carrying treatment as a top-level handoff. The boundary recurses. So does the discipline.

There is a real payoff for getting this right beyond debugging. A whole trace is also your cost ledger and your eval data. Token counts roll up the tree, so you can see that a single user request fanned out into nine LLM calls across four agents and cost what it cost. A clean trace from a failed run is a ready-made eval case: every input the agents saw, in order, replayable. An orphaned trace is none of those things — it is a fragment that answers no question completely.

The postmortem you can actually write

The difference between a broken trace and a whole one is the difference between two postmortems.

With the broken trace, the postmortem has a hole exactly where the cause should be. You write "the specialist agent deleted the wrong record" and "the planner appears to have routed correctly," and between those two sentences is a shrug. You cannot show what the specialist received, so you cannot prove whether the planner's handoff was wrong or the specialist's interpretation was. The action items are vague — "add more logging around handoffs" — which is an admission that you were flying blind.

With the whole trace, the postmortem is a walk down the span tree. The planner received the user's request with the qualifier "except archived records." The handoff span shows the planner's summary to the specialist dropped that clause. The specialist received an unqualified deletion instruction and executed it faithfully. The root cause is not the specialist — it is lossy summarization at the handoff, and the fix is specific: pass the structured constraint, not a prose summary. You found that in minutes because the trace never went dark.

That is the entire argument for treating trace context as a first-class part of every handoff. Agent systems are getting more layered, not less — planners calling specialists calling tools that wrap more agents. Every one of those boundaries is a place the trace can break, and every break is an incident you will debug by hand. Instrument the carrier, not just the payload. Make the handoff carry the trace ID the way an HTTP request carries a header. Then the next time something fails at the seam between two agents, the trace will follow it there — and the bug will have nowhere left to hide.

References:Let's stay in touch and Follow me for more thoughts and updates