Where to Put the Human: Placement Theory for AI Approval Gates
Most teams add human-in-the-loop review as an afterthought: the agent finishes its chain of work, the result lands in a review queue, and a human clicks approve or reject. This feels like safety. It is mostly theater.
By the time a multi-step agent reaches end-of-chain review, it has already sent the API requests, mutated the database rows, drafted the customer email, and scheduled the follow-up. The "review" is approving a done deal. Declining it means explaining to the agent — and often to the user — why nothing that happened for the past 10 minutes will stick.
The damage from misplaced approval gates isn't always dramatic. Often it's subtler: reviewers who approve everything because the real decisions have already been made, engineers who add more checkpoints after incidents and watch trust in the product crater, and organizations that oscillate between "too much friction" and "not enough oversight" without ever solving the underlying placement problem.
Placement is the core variable. Where you insert the human determines whether oversight is meaningful or performative. Getting it right requires a framework, not intuition.
Why End-of-Chain Review Fails
In a linear agent pipeline — gather context → reason → plan → execute — errors introduced early compound through every subsequent step. An incorrect assumption in the planning stage drives multiple downstream tool calls. By the time a human sees the output, the error has been amplified through n layers of agent reasoning and the effects are already in external systems.
This isn't a hypothetical. In healthcare, insurance companies deployed AI denial systems where case managers were explicitly instructed not to deviate from the model's prediction. When the error rate reached 90% on appeals, the downstream damage was visible. The problem wasn't the model itself — it was that human judgment was structurally excluded from the point where it would have had leverage.
The same failure mode appears in less dramatic forms everywhere agents touch real systems. A coding agent that has already pushed a branch, opened a PR, and commented on issues doesn't benefit much from a human reviewing its "work" at the end. The side effects are already distributed.
End-of-chain review fails for three reasons:
- Irreversibility compounds upstream. Each tool execution may be individually reversible, but the combination often isn't. Sending an email draft is reversible; sending it, having the recipient read it, and generating a follow-up calendar invite is a state you can't cleanly undo.
- Review becomes rubber-stamping. When reviewers see final outputs disconnected from intermediate reasoning, they lack the context to meaningfully evaluate the decisions that produced them. Studies on approval fatigue show that humans who review routine AI outputs with low reject rates become systematically less attentive over time.
- The window for intervention has passed. Effective oversight requires the ability to interrupt before consequences propagate. A gate placed after the last consequential action offers no such window.
Classifying Actions by Risk Surface
The foundation of good gate placement is an action inventory. Before deciding where to insert approval gates, enumerate what the agent actually does: every tool call, API request, external message, and state mutation, with an honest assessment of what each one changes in the world.
Three dimensions determine where on the risk surface an action falls:
Reversibility — Can this be undone, at what cost, and by whom? Reading data is trivially reversible (it wasn't changed). Creating a draft document is reversible with effort. Sending an email to a customer is technically reversible (with an apology), but the practical cost is high. Deleting records without backups or submitting a financial transaction may be irreversible in any operational sense.
Blast radius — Who is affected if this goes wrong? An action affecting only internal draft state has a narrow blast radius. An action visible to external users, customers, or third-party systems has a wide one. Anything that creates a commitment on behalf of your organization — an external email, a legal filing, a financial transfer — combines wide blast radius with the organization's reputation.
Compliance exposure — What regulatory requirements apply? Healthcare, financial, and legal domains routinely have mandatory oversight requirements for specific categories of action. These aren't judgment calls about risk tolerance; they're legal obligations.
Using these three dimensions, you can place each agent action into one of three tiers:
| Tier | Characteristics | Example Actions | Gate Type |
|---|---|---|---|
| Safe | Reversible, internal, no compliance exposure | Read data, generate drafts, internal state updates | Autonomous — no gate |
| Sensitive | Creates visible state, external parties may see, moderate blast radius | Send email, create calendar event, post to Slack | Conditional — confidence-gated or sampled |
| Critical | Irreversible or difficult to reverse, external commitments, regulatory exposure | Delete records, financial transactions, customer communications, legal actions | Pre-execution — mandatory human approval every time |
The key insight from this classification is that pre-execution gates belong on critical actions, not on all actions. Gating everything creates approval fatigue. Gating nothing creates operational risk. The table above is a starting point — every team will have domain-specific items that shift tiers based on their specific risk tolerance and regulatory environment.
Placing the Gate Before Irreversibility, Not After
Once you have the action inventory, gate placement follows a simple rule: the approval gate goes immediately before the first action in the chain that crosses into critical territory, not at the end.
Consider an agent that processes a customer refund request. The chain might look like: look up customer record → verify order history → calculate refund amount → issue refund → send confirmation email → update CRM. The critical action is "issue refund" — that's where the gate belongs. Everything before it can execute autonomously. Everything after it (confirmation email, CRM update) can also execute autonomously once the refund is approved, because the consequential decision has already been made by a human.
This is meaningfully different from gating the final output ("send confirmation email"). By that point, the refund has already been issued. Gating the email doesn't protect against a bad refund calculation; it just creates a step where the human approves confirming to the customer something that has already happened.
A well-placed gate has three properties:
- It appears before the point of no return. The human reviews and approves before the irreversible action executes, not after.
- It surfaces enough context to make the decision meaningful. The reviewer sees the agent's reasoning, confidence score, input parameters, and the specific action being proposed — not just the final output.
- It blocks a single decision, not the entire workflow. A narrowly placed gate that intercepts one specific tool call maintains workflow continuity; the agent resumes automatically after approval.
- https://fast.io/resources/ai-agent-human-in-the-loop/
- https://galileo.ai/blog/human-in-the-loop-agent-oversight
- https://www.anthropic.com/research/measuring-agent-autonomy
- https://noma.security/blog/the-risk-of-destructive-capabilities-in-agentic-ai/
- https://orkes.io/blog/human-in-the-loop/
- https://aws.amazon.com/blogs/machine-learning/human-in-the-loop-constructs-for-agentic-workflows-in-healthcare-and-life-sciences/
- https://www.eesel.ai/blog/setting-confidence-thresholds-for-ai-responses
- https://pingax.com/principles-of-agentic-ai-governance/
- https://techcommunity.microsoft.com/blog/microsoft-security-blog/authorization-and-governance-for-ai-agents-runtime-authorization-beyond-identity/4509161
- https://partnershiponai.org/resource/prioritizing-real-time-failure-detection-in-ai-agents/
