The Approval Queue That Became Your Critical Path

June 1, 2026 · 11 min read

Software Engineer

The design doc said "human in the loop." The launch deck said "safe by default." The incident review six months later said the agent took ninety minutes to send a customer an invoice because the approver was at lunch. None of those documents were lying. They were describing the same component at different points on its load curve — and only one of them got the shape right.

When you put a human between an agent and an irreversible action, you have not added a safety primitive. You have added a service with a queue, a throughput limit, a quality-versus-load curve, and an availability profile. The team that ships the agent without naming that service has shipped a product whose critical path runs through a piece of infrastructure they refuse to operate.

The pattern is consistent enough to be predictable. Someone documented a real case where requiring human approval for every agent output grew a backlog to fourteen thousand pending items in forty-eight hours, against a three-person review team processing about two hundred items an hour. Average approval latency landed at six and a half hours. The agent had not broken. The agent had done exactly what it was supposed to do. The reviewer pool had simply been sized for the design-doc traffic, not the production traffic, and the gap between those two numbers showed up as a P0 the morning after launch.

The Queue Is A Service, Not A Checkbox

The first mistake is treating "approval required" as a property of the workflow rather than a dependency in the system. Properties are static. Dependencies have SLOs, capacity models, on-call rotations, and dashboards. The approval gate has all of those whether or not the team has written them down.

The minimum honest framing: every approval step is a queue with arrivals, service times, and a finite set of workers. The agent is one of the arrival processes. Customer escalations are another. Routine retries are a third. The reviewers are the workers. Queueing theory has had words for this since the 1960s — the team has just not been using them because the word "human" felt like it changed the math.

It does not change the math. It changes the variance. Human service times have heavy tails — a routine approval takes thirty seconds, a confusing one takes twenty minutes, and the same reviewer handles both. Heavy-tailed service times mean queue depth grows faster under load than the average would suggest. The dashboard that shows mean approval latency at four minutes is hiding the long tail that bites you when the queue is hot.

If you do not have a queue-depth dashboard alongside your agent latency dashboard, the queue is invisible to the people who control the throttle on its arrival rate. They will not throttle. The depth will grow. The first time anyone notices is when a customer complains.

Reviewer Accuracy Is Not Constant

The second mistake is treating each approval as an independent transaction with a fixed quality. It is not. The same reviewer looking at the same kind of decision will produce different judgments at item three versus item three hundred, before lunch versus after lunch, during a normal week versus during a release crunch.

This matters because the safety case the team made at launch assumed reviewer accuracy was high enough to catch the agent's mistakes. That assumption was measured on a small sample, on calm days, with reviewers who knew they were being watched. Under production load the same reviewers context-switch between unrelated agent tasks, lose track of which proposal they are looking at, click approve on the wrong row, and miss the one decision that mattered because it was buried between ninety-nine that did not.

A common range for escalation SLAs in regulated environments is fifteen minutes for real-time gates, four hours for batch gates, and twenty-four hours for complex review. Below fifteen minutes, reviewer fatigue starts to dominate and gate quality drops. That is not a soft observation — it is a load-bearing constraint on what kind of agent you can ship at all. If your business model needs sub-minute approvals, you do not have a human-in-the-loop system. You have a rubber stamp with extra steps.

The honest instrumentation is to measure reviewer accuracy as a function of queue depth, queue age, items-per-hour, and time-of-day. Plot the curve. The point at which accuracy crosses the agent's own error rate is the point at which the human gate is making things worse, not better. Teams that have not plotted the curve are guessing about which side of that line they are on.

The Org Chart That Hides The Bottleneck

The third mistake is structural. The team that designs the agent does not own the queue. The team that owns the queue did not size for the rollout. Product owns the migration metric. SRE owns availability. Procurement owns headcount. The agent's critical-path component is split across four teams with four different scorecards, and the seam between them is exactly where the failure lives.

You can predict the script. The agent team measures task success and reports a green dashboard. The reviewer team measures items processed per shift and reports a green dashboard. The customer-facing latency dashboard is red, because the customer was waiting on an approval that sat in a queue nobody owned. Each team is doing its job. The job that nobody has is "watch the seam."

The pattern that closes the gap is the same pattern every mature on-call rotation eventually adopts: someone is on the hook for end-to-end latency. Not for the agent's latency. Not for the reviewer's response time. For the customer-perceived time between "I asked for this" and "I got it." That person has the authority to throttle the agent's arrival rate, to page additional reviewers, or to offer the customer a smaller scope that does not need approval. If nobody has that authority, the queue is the load-bearing wall in a house with no structural engineer.

Patterns That Actually Close The Gap

A handful of design moves keep showing up in teams that have already taken this hit and survived it. None of them are exotic. All of them require treating the queue as a first-class system.

Tier the requests by risk and route accordingly. Coinbase reported handling a tripled support volume in 2024–2025 with the same headcount by routing roughly sixty percent of routine queries to agent-only paths, account-specific work to a Tier 1 reviewer pool, and high-value transactions to specialized teams. The same logic applies to internal agents: an expense agent that auto-approves under five hundred dollars, routes the five-thousand-dollar tier to a finance reviewer, and escalates anything stranger does not have one queue. It has three queues with three different SLOs, and the system survives load on each independently.

Batch and triage at the reviewer's interface. A reviewer who can sweep ten low-risk approvals as one decision and then flag the eleventh outlier processes a different volume than a reviewer who must click each row individually. The batching primitive does not loosen the gate — it changes the time-per-decision curve under load. The eleventh outlier still gets the full attention. The first ten stop being a fatigue source.

Make the SLO user-facing, not reviewer-facing. "Reviewer responds in fifteen minutes" is not the contract the customer signed. The contract the customer signed was "I get my result in fifteen minutes." Measuring reviewer response time and ignoring end-to-end latency lets the queue look healthy on the dashboard the team owns while the customer-perceived latency drifts in the dashboard nobody owns. Flip the measurement. The number that matters is the time from user intent to user-visible outcome.

Build graceful degradation into the agent itself. When the queue is over capacity, the agent should be able to offer the user a smaller scope that does not require approval, or to defer the request explicitly with an honest ETA rather than letting it sit in a spinner. The fallback is not a workaround — it is a load-shedding strategy. Without it, the agent's only behavior under load is "wait silently," which the user reads as "broken."

Queue-aware planning at the agent layer. If the agent knows the queue is hot, it can defer low-priority approval-gated work, reserve reviewer capacity for high-priority paths, and reorder its own task list. This requires the queue's state to be visible to the agent — which means the queue has to be a first-class observable component, not a black box behind an API call.

The Unit Economics You Are Hiding From Yourself

The cost frame is where most teams' models fall apart on contact with production. A typical unit-economics calculation for an agent: model cost per call, tool latency, hosting. Reviewer time rarely appears. When it does appear, it is priced at the bare hourly rate, not at the loaded cost that includes management, onboarding, attrition, and the opportunity cost of senior reviewers handling routine items because junior reviewers were the bottleneck.

A more honest accounting starts at the loaded value of the human time the agent is replacing — U.S. civilian compensation averaged around forty-nine dollars per hour at the end of 2025, and professional services materially higher. The vendor-followup agent that saves sixty manual hours a month at one hundred dollars per hour is producing six thousand dollars of value. The platform cost is twelve hundred. If approval review takes nine hours of reviewer time, net savings is around thirty-nine hundred. If review climbs to thirty-five hours, the math inverts and the workflow has become net-negative.

The number that surprises teams is how sensitive the unit economics are to review time. Approval-gated agents are not priced by inference cost. They are priced by reviewer hours. The team that did not surface that math in the migration proposal has signed leadership up for a bill whose largest line item is headcount that grows with adoption — exactly the opposite of the curve everyone was promised.

The second uncomfortable number is the failure rate of agentic-AI projects. Gartner's 2025 survey put forty percent of them at risk of being scrapped by 2027, largely on cost and unclear business value. The teams that survive will be the ones that priced the human gate honestly and built the routing to keep it small. The teams that scrap will be the ones that discovered, too late, that they had bought a service whose marginal cost was a reviewer hour.

Reviewer Capacity Is Infrastructure Capacity

The architectural realization is simple and uncomfortable. Treating the human reviewer as infinite throughput is the same mistake as treating the inference endpoint as infinite throughput. Both produce outages that look like the agent broke, when the actual failure was a capacity model nobody wrote.

The fix is not to remove the human. The fix is to operate the human-gated component the way you would operate any other capacity-bounded service: with arrival-rate controls, with depth and age telemetry, with load-shedding paths, with an owner whose scorecard includes end-to-end latency, with a staffing model tied to forecasted load the same way an on-call rotation is tied to forecasted incidents.

The approval queue is a scheduling problem with a human in the loop. Once you call it that, the toolkit you need is the one you already have for every other scheduling problem in your stack — backpressure, prioritization, batching, graceful degradation, and an SLO measured at the boundary that matters to the user. The teams that have figured this out are not running magic. They are running queues. The teams that have not figured it out are running queues too, except they are calling them safety primitives, and the difference is showing up in the postmortems.

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

The Approval Queue That Became Your Critical Path

The Queue Is A Service, Not A Checkbox

Reviewer Accuracy Is Not Constant

The Org Chart That Hides The Bottleneck

Patterns That Actually Close The Gap

The Unit Economics You Are Hiding From Yourself

Reviewer Capacity Is Infrastructure Capacity

Recommended Reading

About Tian Pan

The Queue Is A Service, Not A Checkbox​

Reviewer Accuracy Is Not Constant​

The Org Chart That Hides The Bottleneck​

Patterns That Actually Close The Gap​

The Unit Economics You Are Hiding From Yourself​

Reviewer Capacity Is Infrastructure Capacity​

Recommended Reading

About Tian Pan

The Queue Is A Service, Not A Checkbox

Reviewer Accuracy Is Not Constant

The Org Chart That Hides The Bottleneck

Patterns That Actually Close The Gap

The Unit Economics You Are Hiding From Yourself

Reviewer Capacity Is Infrastructure Capacity