Skip to main content

Where to Put the Human: Placement Theory for AI Approval Gates

· 12 min read
Tian Pan
Software Engineer

Most teams add human-in-the-loop review as an afterthought: the agent finishes its chain of work, the result lands in a review queue, and a human clicks approve or reject. This feels like safety. It is mostly theater.

By the time a multi-step agent reaches end-of-chain review, it has already sent the API requests, mutated the database rows, drafted the customer email, and scheduled the follow-up. The "review" is approving a done deal. Declining it means explaining to the agent — and often to the user — why nothing that happened for the past 10 minutes will stick.

The damage from misplaced approval gates isn't always dramatic. Often it's subtler: reviewers who approve everything because the real decisions have already been made, engineers who add more checkpoints after incidents and watch trust in the product crater, and organizations that oscillate between "too much friction" and "not enough oversight" without ever solving the underlying placement problem.

Placement is the core variable. Where you insert the human determines whether oversight is meaningful or performative. Getting it right requires a framework, not intuition.

Why End-of-Chain Review Fails

In a linear agent pipeline — gather context → reason → plan → execute — errors introduced early compound through every subsequent step. An incorrect assumption in the planning stage drives multiple downstream tool calls. By the time a human sees the output, the error has been amplified through n layers of agent reasoning and the effects are already in external systems.

This isn't a hypothetical. In healthcare, insurance companies deployed AI denial systems where case managers were explicitly instructed not to deviate from the model's prediction. When the error rate reached 90% on appeals, the downstream damage was visible. The problem wasn't the model itself — it was that human judgment was structurally excluded from the point where it would have had leverage.

The same failure mode appears in less dramatic forms everywhere agents touch real systems. A coding agent that has already pushed a branch, opened a PR, and commented on issues doesn't benefit much from a human reviewing its "work" at the end. The side effects are already distributed.

End-of-chain review fails for three reasons:

  • Irreversibility compounds upstream. Each tool execution may be individually reversible, but the combination often isn't. Sending an email draft is reversible; sending it, having the recipient read it, and generating a follow-up calendar invite is a state you can't cleanly undo.
  • Review becomes rubber-stamping. When reviewers see final outputs disconnected from intermediate reasoning, they lack the context to meaningfully evaluate the decisions that produced them. Studies on approval fatigue show that humans who review routine AI outputs with low reject rates become systematically less attentive over time.
  • The window for intervention has passed. Effective oversight requires the ability to interrupt before consequences propagate. A gate placed after the last consequential action offers no such window.

Classifying Actions by Risk Surface

The foundation of good gate placement is an action inventory. Before deciding where to insert approval gates, enumerate what the agent actually does: every tool call, API request, external message, and state mutation, with an honest assessment of what each one changes in the world.

Three dimensions determine where on the risk surface an action falls:

Reversibility — Can this be undone, at what cost, and by whom? Reading data is trivially reversible (it wasn't changed). Creating a draft document is reversible with effort. Sending an email to a customer is technically reversible (with an apology), but the practical cost is high. Deleting records without backups or submitting a financial transaction may be irreversible in any operational sense.

Blast radius — Who is affected if this goes wrong? An action affecting only internal draft state has a narrow blast radius. An action visible to external users, customers, or third-party systems has a wide one. Anything that creates a commitment on behalf of your organization — an external email, a legal filing, a financial transfer — combines wide blast radius with the organization's reputation.

Compliance exposure — What regulatory requirements apply? Healthcare, financial, and legal domains routinely have mandatory oversight requirements for specific categories of action. These aren't judgment calls about risk tolerance; they're legal obligations.

Using these three dimensions, you can place each agent action into one of three tiers:

TierCharacteristicsExample ActionsGate Type
SafeReversible, internal, no compliance exposureRead data, generate drafts, internal state updatesAutonomous — no gate
SensitiveCreates visible state, external parties may see, moderate blast radiusSend email, create calendar event, post to SlackConditional — confidence-gated or sampled
CriticalIrreversible or difficult to reverse, external commitments, regulatory exposureDelete records, financial transactions, customer communications, legal actionsPre-execution — mandatory human approval every time

The key insight from this classification is that pre-execution gates belong on critical actions, not on all actions. Gating everything creates approval fatigue. Gating nothing creates operational risk. The table above is a starting point — every team will have domain-specific items that shift tiers based on their specific risk tolerance and regulatory environment.

Placing the Gate Before Irreversibility, Not After

Once you have the action inventory, gate placement follows a simple rule: the approval gate goes immediately before the first action in the chain that crosses into critical territory, not at the end.

Consider an agent that processes a customer refund request. The chain might look like: look up customer record → verify order history → calculate refund amount → issue refund → send confirmation email → update CRM. The critical action is "issue refund" — that's where the gate belongs. Everything before it can execute autonomously. Everything after it (confirmation email, CRM update) can also execute autonomously once the refund is approved, because the consequential decision has already been made by a human.

This is meaningfully different from gating the final output ("send confirmation email"). By that point, the refund has already been issued. Gating the email doesn't protect against a bad refund calculation; it just creates a step where the human approves confirming to the customer something that has already happened.

A well-placed gate has three properties:

  1. It appears before the point of no return. The human reviews and approves before the irreversible action executes, not after.
  2. It surfaces enough context to make the decision meaningful. The reviewer sees the agent's reasoning, confidence score, input parameters, and the specific action being proposed — not just the final output.
  3. It blocks a single decision, not the entire workflow. A narrowly placed gate that intercepts one specific tool call maintains workflow continuity; the agent resumes automatically after approval.

The Interrupt-Checkpoint-Resume Pattern

The technical foundation for meaningful mid-chain oversight is the interrupt-checkpoint-resume pattern. At any point in an agent's execution, the workflow engine calls an interrupt that:

  1. Pauses execution at the current node
  2. Persists the complete execution state (all variables, accumulated context, tool call history, intermediate results) via a checkpoint store
  3. Presents a structured action payload to the reviewer — the proposed action, its inputs, and the reasoning that produced it
  4. Waits indefinitely for human input
  5. Resumes from the exact pause point with the human's decision applied

The indefinite wait is important. Most production approval workflows are asynchronous — the reviewing human isn't watching the agent in real time. Checkpoint-based persistence means the agent can be interrupted at 9am, reviewed at 2pm, and resumed at 2:05pm without losing state or re-executing prior steps.

Without the checkpointer, the alternative is restarting the workflow from the beginning when an interruption occurs. That creates duplicate side effects (if any actions executed before the interrupt), wasted API costs, and a poor experience for anyone watching the workflow status. Proper checkpointing is what makes mid-chain gates operationally viable rather than just theoretically appealing.

The state surfaced to the reviewer should be designed for decision-making, not debugging. Include: the proposed action in plain language, the specific parameters being passed to the tool, the agent's confidence score if available, relevant context from earlier in the chain, and the downstream actions that will execute automatically if approved. Reviewers who see all of this can make meaningful decisions. Reviewers who see agent wants to execute tool: send_email, parameters: {...} cannot.

Confidence-Gated Escalation for Sensitive Actions

For actions in the sensitive tier — visible but not irreversible, or low-to-moderate blast radius — mandatory pre-execution gates often create too much friction. The better pattern is confidence-gated escalation: the agent proceeds autonomously when it's confident, and escalates to human review when it isn't.

Typical threshold ranges in production deployments fall between 80% and 90% confidence for autonomous execution, with escalation below that threshold. The optimal threshold depends on the specific action's risk profile and your operational capacity for review — a target escalation rate of 10-15% is a reasonable starting point, meaning you're aiming for 85-90% of sensitive actions to execute without human intervention.

Confidence thresholds aren't static configuration values — they require ongoing calibration. The calibration signal comes from human decisions: cases where humans override the agent's confident decision indicate thresholds that need to be tightened; cases where humans approve escalations without modification indicate thresholds that can be relaxed. Running this feedback loop actively is what keeps escalation rates in the target range as agent behavior and input distributions shift over time.

One practical note: confidence scores should be calculated per-action, not per-conversation. An agent that's highly confident about its overall plan may still be uncertain about a specific tool call within that plan. Aggregate confidence scores produce miscalibrated gates.

The Trust Economics of Gate Placement

Poorly placed gates are worse than no gates from a user trust perspective. This is counterintuitive but well-documented.

When reviewers are asked to approve routine, low-stakes actions repeatedly — confirming that a read-only lookup should execute, approving a draft that hasn't been sent to anyone — a few things happen. First, the approve button gets clicked reflexively. Reviewers lose the habit of scrutinizing approvals because past experience tells them nothing important is at stake. Second, users lose confidence in the agent system because it appears to require hand-holding for trivial operations. Third, engineers are pressured to remove gates to reduce friction, and they comply, often removing the wrong ones.

The pattern degrades trust in both directions: reviewers don't trust the gates to surface meaningful decisions (so they stop paying attention), and product teams don't trust the gates to not create friction (so they remove them). The result is often worse oversight than if the gates had been placed correctly from the start.

Well-placed gates have the opposite effect. When every escalation represents a genuinely ambiguous or high-stakes decision, reviewers stay engaged. When autonomous execution handles 85-90% of operations reliably, users experience the agent as capable, not helpless. The gate becomes a signal of trust rather than a signal of doubt — the agent trusts its own judgment for routine decisions and surfaces the ones where it shouldn't.

Building the Placement Decision into System Design

Gate placement decisions are architectural. Making them late — after incidents, in response to user complaints, or during operational review — means making them under pressure without full visibility into the action graph.

The right time to do the action inventory and risk classification is during system design, when you have control over both the agent's action surface and the review workflow. A few practices make this tractable:

Build the action graph before building the agent. List every tool the agent can call, document its reversibility and blast radius, and assign each to a tier. This forces design decisions about what the agent should and shouldn't be able to do before those capabilities are wired in.

Treat approval workflow as a first-class interface. The UI or console that reviewers use to evaluate and approve escalations deserves as much design attention as the user-facing interface. If the approval interface doesn't surface enough context for meaningful decisions, the gate is poorly placed regardless of its position in the chain.

Make the gate position part of the operational runbook. When something goes wrong, the first question is often "why didn't we catch this earlier?" Having documented gate positions and the reasoning behind them makes this answerable — and makes post-incident improvements actionable rather than defensive.

Audit escalation data regularly. Track what percentage of escalations get approved vs. rejected vs. modified. A high approval rate suggests either well-calibrated autonomy (ideal) or rubber-stamping (investigate). A high rejection rate suggests the agent is escalating decisions it should be making autonomously, or that the agent's proposals are routinely wrong at the escalation boundary. Both signal adjustment.

What This Means for Production Systems

The emerging operational reality as of 2025-2026 is that human oversight capacity, not AI execution speed, is the binding constraint in agentic deployments. An agent that can process 1,000 workflows per hour generates 150 escalations per hour if your escalation rate is 15%. Most teams don't have 150 reviewers per hour. Misplaced gates that escalate unnecessarily make this worse; well-placed gates ensure that the 150 escalations represent 150 decisions that genuinely require human judgment.

The practical implication is that placement decisions are capacity decisions. Every approval gate you add to non-critical actions consumes reviewer bandwidth that would otherwise go to genuinely critical decisions. The cost of over-gating isn't just friction — it's degraded oversight quality at the gates that actually matter, because reviewers have limited attention and approval fatigue is real.

Getting placement right means accepting a counterintuitive principle: fewer gates, better placed, with mandatory enforcement on critical actions, produce more meaningful human oversight than comprehensive gating across the entire action graph. The goal isn't to keep humans in the loop on everything. It's to keep humans in the loop on the things that actually require human judgment.

That's a narrower set than most teams initially assume, but defending it rigorously is what makes oversight sustainable rather than theatrical.

References:Let's stay in touch and Follow me for more thoughts and updates