The Approval Queue Nobody Drains
You did the responsible thing. You looked at your agent, identified the actions that could cause real damage — issuing a refund, deleting a record, sending an external email, deploying a config change — and you routed them to a human for approval. Risk-tiered gating. Textbook. The review board signed off.
Then a customer escalation came in three weeks later: an agent task had been "in progress" since the previous Tuesday. Not failed. Not errored. Just sitting in a human approval queue that, it turned out, nobody was actually watching. The agent had done its job, parked the dangerous action behind a gate, and waited. The gate had no owner. The task aged silently in a place where no dashboard pointed and no alarm fired.
This is the failure mode that risk-tiered gating quietly introduces. The gate itself is correct — you should not let an agent wire money without a human in the loop. But "add a human approval step" is not a safety feature you install once. It is a new piece of production infrastructure with its own uptime, its own latency budget, and its own ways of breaking. Most teams ship the gate and never operate it. The queue becomes a black hole: things go in, the throughput out is whatever a distracted human happens to provide, and nobody is measuring the difference.
A human gate is a dependency, not a decision
When you add an approval step, you have not made the agent safer in isolation — you have added a service to your critical path. That service happens to be staffed by people, but architecturally it behaves like any other dependency: it has a queue, a processing rate, a failure mode when overloaded, and a tail latency that matters more than its average.
Engineers know how to reason about a dependency made of software. If the agent called a payments API, you would ask: what is its p99 latency? What is its error rate? What happens when it is down? Is there a timeout? A retry? A circuit breaker? You would put it on a dashboard and you would alert on it.
The human approval queue gets none of that scrutiny, because it does not look like a service. It looks like a Slack channel, or a row in an admin panel, or an email that goes to a shared inbox. But it is on the critical path of every gated agent task, and it is almost always the slowest, least observable, least owned component in the whole system. The agent's median latency might be eight seconds. The queue's median latency might be six hours, and its p99 might be "never," and you would not know either number because nobody is computing them.
The reframe that fixes most of this: treat the approval queue as a system with an SLO. Not a vibe. A number. If a tier-2 action is supposed to be reviewed within one hour, then "time from enqueue to decision" is a latency metric, and the fraction of items that beat one hour is an SLO you can hit or miss. The moment you write that number down, the queue stops being a black hole and starts being something you can operate.
What a queue actually needs to be operable
Once you accept that the queue is infrastructure, the missing pieces become obvious — they are the same pieces any operable system needs.
An owner. Not "the team." A specific role that is accountable for the queue being drained, the way a service has an on-call rotation. If the answer to "who is responsible for clearing pending approvals right now" is a shrug or a list of five names, the answer is functionally nobody. Shared ownership of a queue is the same as shared ownership of a dirty kitchen.
Depth and wait time as monitored signals. Two numbers, on a dashboard, with alerts. Queue depth — how many items are pending — tells you about backlog. Wait time — how long the oldest pending item has been waiting — tells you about staleness. They fail differently. A queue can be shallow and stale (two items, both eight hours old, because the reviewer is on vacation) or deep and fresh (forty items, none older than ten minutes, because a batch just landed). You want to alert on both, because both are bad for different reasons.
An escalation path. When the primary reviewer does not act, the item should move — to a backup, to a lead, to a wider channel — automatically, on a timer. Escalation is what converts "one person was unavailable" from an outage into a delay. Without it, the queue's availability is exactly equal to one human's calendar.
A timeout policy with a defined terminal state. This is the one almost everyone skips, and it is the most important. Every item in the queue must have an answer to: what happens if no human ever looks at this? Not as an exception. As designed behavior.
The decision you must make before the queue exists: what happens to an item that ages out
An agent task waiting on approval is in a superposition. It is neither done nor failed. It is consuming a slot, possibly holding a lock, possibly blocking a customer, and it will stay that way until a human collapses the wavefunction. If no human ever does, the task is immortal — and immortal tasks are how you get a record that has been "processing your refund" for nine days.
- https://galileo.ai/blog/human-in-the-loop-agent-oversight
- https://prefactor.tech/learn/designing-agent-approval-workflows
- https://www.maviklabs.com/blog/human-in-the-loop-review-queue-2026/
- https://aipatternbook.com/approval-fatigue
- https://cloud.google.com/transform/when-ai-writes-the-code-who-reviews-it-cto-google-cloud
- https://medium.com/@astrasyncai/the-human-bottleneck-why-ai-agent-verification-cant-scale-with-human-in-the-loop-5f8c1aff8456
- https://sloanreview.mit.edu/article/ai-explainability-how-to-avoid-rubber-stamping-recommendations/
