Skip to main content

Epistemic Trust in Agent Chains: How Uncertainty Compounds Through Multi-Step Delegation

· 10 min read
Tian Pan
Software Engineer

Most teams building multi-agent systems spend a lot of time thinking about authorization trust: what is Agent B allowed to do, which tools can it call, what data can it access. That's an important problem. But there's a second trust problem that doesn't get nearly enough attention, and it's the one that actually kills production systems.

The problem is epistemic: when Agent A delegates a task to Agent B and gets back an answer, how much should A believe what B returned?

This isn't a question of whether B was authorized to answer. It's a question of whether B actually could.

A subagent's reliability depends on factors the orchestrator cannot directly observe: what model tier is running inside it, how much context it had access to, whether its tools were scoped correctly, and whether the task it received was within its area of competence. When an orchestrator accepts a subagent's output without accounting for these factors, it doesn't just inherit the answer — it inherits the errors, and compounds them into whatever reasoning step comes next.

The Math That Breaks Multi-Agent Systems

The compounding problem is straightforward but easy to underestimate. A single agent operating at 99% per-step accuracy drops to 90.4% reliability over a 10-step chain. At 95% accuracy per step — still a strong result for most real-world tasks — the chain lands at 59.9% reliability over 10 steps and 35.8% over 20 steps.

Field data makes this worse. Hallucination rates that test at under 1% on isolated tasks balloon to 63% failure rates in 100-step agent pipelines. A Google Research study across 180 agent configurations found that independent multi-agent networks amplify errors 17.2x compared to single-agent baselines. Centralized coordination with shared context brought this down to 4.4x — still a significant hit, but recoverable.

Analysis of 1,642 execution traces across seven open-source multi-agent frameworks found failure rates ranging from 41% to 86.7%, with coordination breakdowns accounting for 36.9% of all failures. Demos succeed at 60%; consistent production operation at 8-run reliability drops the same system to 25%.

None of this is a prompting problem. It's structural: most orchestrators treat subagent outputs as ground truth, discarding the uncertainty context that would allow them to reason about whether to trust, verify, or discount what they received.

Two Trust Problems, One Name

Authorization trust and epistemic trust are frequently conflated, and the confusion is costly.

Authorization trust asks: is this agent permitted to perform this action? This is where sandboxing, capability scoping, and access control live. It's a well-understood problem with a large ecosystem of tooling around it.

Epistemic trust asks: is this agent competent to return a reliable answer for this specific task? Competence has multiple dimensions:

  • Model tier: Did the orchestrator delegate a complex reasoning task to a Haiku-equivalent when it needed Opus-equivalent reasoning? The orchestrator often doesn't know.
  • Context quality: Did the subagent receive all the context it needed, or was it working with a truncated view of the problem?
  • Tool scope: Were the tools available to the subagent actually the right ones for the sub-task? A narrowly scoped tool list can cause an agent to hallucinate rather than admit it can't reach what it needs.
  • Domain fit: Was the sub-task within the subagent's operating domain? LLMs are notoriously overconfident when operating outside their training distribution. Expected Calibration Error (ECE) in production systems ranges from 0.108 to 0.427 — meaning the confidence an agent expresses in its answers is a poor predictor of whether those answers are correct.

Authorization trust has a binary answer: allowed or not. Epistemic trust is probabilistic and context-dependent. An agent that handled last week's task competently may fail this week's task with equal confidence.

The Overconfidence Trap

LLMs don't naturally express uncertainty. They're trained to produce fluent, confident completions — which means a subagent operating outside its domain doesn't typically say "I'm not sure about this." It says something coherent and plausible that happens to be wrong.

This is the epistemic trust trap: the signal that an orchestrator would need to discount a subagent's output — explicit uncertainty, admission of limited knowledge, caveats about missing context — is precisely what LLMs tend to suppress.

Research on LLM calibration consistently finds severe overconfidence. Models can achieve high accuracy on benchmark tasks while their verbalized confidence estimates remain miscalibrated across the range of tasks they're actually deployed on. When a subagent returns an answer with high apparent confidence, that confidence is not a reliable signal of correctness.

The trap deepens in chains. If Agent A delegates to Agent B, which delegates to Agent C, each hand-off strips away context about what the previous agent knew and didn't know. Agent A receives a final answer that's been laundered through two rounds of competent-sounding output generation, with no trace of the uncertainty that accumulated along the way.

Patterns That Actually Address This

Confidence Annotations in Return Schemas

The most direct fix is structural: don't just return answers from subagents, return annotated answers. A return schema that includes explicit uncertainty metadata changes what the orchestrator can reason about.

Useful fields to include:

  • A confidence tier (high / medium / low) based on whether the task was within the agent's typical operating range
  • A context-sufficiency flag indicating whether the agent had complete context or had to make assumptions
  • A domain-match signal indicating whether the query fell into the agent's primary training distribution
  • A verification recommendation that the agent itself flags when it believes the answer warrants double-checking

This approach requires coordination across the agent system's interfaces — subagents that return raw answers need to be refactored to return structured responses. The payoff is that orchestrators gain the signal they need to apply appropriate skepticism.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates