The Escalation Path That Routes Back to the Agent
The escalation tool was the safety net. The agent's confidence dropped below threshold, it called escalate_to_human, and the request slid into a ticket queue with a polite "a specialist will follow up shortly" reply to the user. Engineering closed the loop on the launch checklist. The on-call calendar listed humans on the receiving end.
Six months later, an audit walked the path. The escalation tool opened a Zendesk ticket. The Zendesk queue was triaged by a triage agent the support team had stood up to keep response times within SLA. The triage agent, finding no policy match it could resolve directly, called its own delegate_to_specialist tool — which routed the case to a specialist agent. The specialist agent, when uncertain, called escalate_to_human. The trace was a closed circuit. No human had touched any of the escalations the audit sampled. The human-in-the-loop the launch doc described did not exist.
The escalation interface had not failed. It had been honored at every hop. What failed was the assumption that the receiving system was a person.
Escalation Is an Interface, Not a Destination
When you wire escalate_to_human into an agent's tool list, you are not handing the request to a human. You are calling an API whose contract is eventually, a human will see this and act. The contract has two parts: the request lands somewhere a human can read, and a human actually reads it.
The first part is easy to verify. You can curl the ticket creation endpoint, see the ticket appear in the queue, and tick the box. The second part is invisible from the agent's side. The agent observes the same response — ticket created, status open — whether a human reads it tomorrow, a robot reads it in eight seconds, or nobody reads it at all. The tool's success criterion is "the request was accepted by the next system." The interface does not commit to who or what consumes it.
This is the same shape as any other queue-based handoff in distributed systems. Producers do not know consumers. That decoupling is the entire point. But for safety-critical paths, the decoupling is the bug. The escalation tool's whole purpose is to route work to a different kind of processor — a human one — and the moment the consumer changes, the safety property changes with it, silently, with no signal at the producer.
The Zendesk queue did not announce that it had grown an agent on top. The support team's triage automation was a productivity win, deployed without any awareness that another team's safety design depended on the queue being human-tailed. Each side made a locally reasonable decision. The intersection was a loop.
How the Loop Closes
The loop rarely closes in a single hop. It closes through the slow accretion of automation across a system that nobody owns end-to-end.
The pattern is recognizable once you look for it. A team builds an automation for their queue because volume is up and humans are slow. A different team builds an escalation path into that queue because their agent needs a safety net. The two teams operate on different schedules, in different parts of the org chart, with different review processes. The first team's automation is a productivity feature. The second team's escalation path is a compliance feature. Neither team's review includes the question what does the other team's system do with this? — because they don't know the other team's system is on the path.
The first team is solving a queue-depth problem. They add an LLM-powered triage agent that can resolve a third of incoming tickets directly: password resets, status checks, simple lookups. For tickets it cannot resolve, it routes onward. The team measures success on deflection rate and human-review minutes saved. Both metrics improve. The agent works.
The second team's escalation path was designed against a queue that was 100% human-tailed when it was built. The launch review confirmed escalations reached the queue. Nothing in the launch review tested who pulled them out. When the first team's triage agent ships, the path silently changes shape, and nothing in the second team's monitoring catches it. The escalation success rate stays at 100% — the ticket is created every time. The fraction of escalations that ever touch a human is invisible to that metric.
Six months later, an incident or an audit forces a trace from end to end. The picture is uglier when the path branches across more than two systems. A specialist agent might dispatch to a vendor's API, the vendor's API might enqueue into their own automation, their automation might call back into the original org's tools through an integration. The longer the path, the more chances for a human-shaped slot to be quietly filled.
The Metric That Should Have Caught It
The standard escalation metrics are throughput metrics. Number of escalations created. Average time-to-create. Queue depth. None of them measure the thing the escalation was for.
The metric that matters is human-touch rate on the escalation path: of the escalations created in a window, what fraction had at least one explicitly human action on the trace before resolution. Not "a person could have looked." Not "the ticket reached a queue humans can read." A logged human action — a comment authored by a user account that has not been delegated to an agent, a status transition by a human, a reply that left a human-controlled outbox.
- https://www.usefini.com/guides/best-ai-agent-assist-tools-human-in-the-loop-customer-support
- https://medium.com/@arvisionlab/human-in-the-loop-ai-agents-how-to-add-approvals-escalation-and-safe-autonomy-in-production-0a21e359781c
- https://dev.to/taimoor__z/-human-in-the-loop-hitl-for-ai-agents-patterns-and-best-practices-5ep5
- https://www.serval.com/insights/ai-first-workflows-with-human-escalation-what-makes-escalation-trustworthy-not-just-fast
- https://www.cxtoday.com/contact-center/is-your-ai-escalation-strategy-breaking-customer-trust/
- https://www.virtasant.com/ai-today/ai-customer-service-agents-context-loss
- https://towardsdatascience.com/the-multi-agent-trap/
- https://www.bucher-suter.com/escalation-design-why-ai-fails-at-the-handoff-not-the-automation/
- https://decagon.ai/blog/ai-chatbot-challenges
- https://www.agentpatterns.tech/en/failures/infinite-loop
- https://www.mindstudio.ai/blog/ai-agent-failure-pattern-recognition
- https://medium.com/@Quaxel/7-agent-failure-modes-you-can-spot-early-be2777d4f171
- https://appscale.blog/en/blog/microservices-pattern-human-in-the-loop-escalation-2026
