The Coding Agent Autonomy Curve: Reading Is Free, Merging Is Incident-Class
The discourse on coding agents keeps collapsing to a binary: autonomous or supervised, YOLO mode or hand-on-the-wheel, --dangerously-skip-permissions or "approve every keystroke." That framing is a category error. A coding agent does not perform "an action." It performs a sequence of actions whose costs span at least seven orders of magnitude — from reading a file (free, undoable, no side effect) to merging to main (irreversible without a revert PR) to rolling out a binary to a fleet (six-figure incident-class). Treating that range with one autonomy switch is like setting a single speed limit for both a parking lot and a freeway.
The team that ships "the agent can do everything" without mapping each action to its blast radius is one prompt-injection-bearing GitHub comment away from a postmortem — and we already have public examples of that exact failure mode. Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and GitHub Copilot Agent were all confirmed in 2026 to be hijackable through specially crafted PR titles and issue bodies, in an attack pattern the researchers named "Comment and Control." The agents weren't broken in some abstract sense. They executed a high-tier action — pushing code, opening a PR — on the basis of a low-trust input the autonomy tier had silently flattened into "all the same."
What follows is the discipline that has to land: a per-action curve, gates that scale with the tier, rollback velocity matched to blast class, and an eval program that tests for tool-composition escalation rather than single-action failure.
The Action Ladder Nobody Draws
Start by listing every tool a coding agent can call and labeling it with a blast-radius class. The ladder is roughly seven rungs:
- Tier 0 — Read. File reads, grep, AST queries,
gh pr view, log fetches. No side effects, no token cost beyond inference, infinitely undoable. This is where 80% of agent traffic should live. - Tier 1 — Sandboxed compute. Running tests inside a container, executing a one-off script in an ephemeral VM, running a linter. Side effects exist but are walled off. Cost: minutes of CI time.
- Tier 2 — Branch-local writes. Editing a file in a feature branch, creating commits, pushing the branch. Bounded by the branch — blast contained until human action moves it forward.
- Tier 3 — Reviewable proposals. Opening a PR, leaving a review comment, suggesting a change. Reaches humans, but humans gate the next step.
- Tier 4 — Main-line writes. Merging to main, closing issues, modifying GitHub Actions, editing
CODEOWNERS. Touches shared state. Reversible only with a revert PR and the wall-clock latency of CI. - Tier 5 — Production deploys. Promoting an artifact, running a database migration, flipping a feature flag in prod. Customer-visible. Reversible only with a tested rollback.
- Tier 6 — Fleet-class. Rolling a binary to N production hosts, broadcast cache invalidation, mass DNS or routing changes, anything that hits "everywhere at once." Recovery is measured in incident response, not commits.
The point of writing this down is not to memorize seven categories. It is to force the question: which tier does each tool belong to, and what is the appropriate gate for that tier? A coding agent platform that exposes one global "ask the user" toggle has implicitly answered this question with a single value, which means it answered every cell of the matrix wrong except by accident.
Approval Gates That Scale With the Tier
A single global confirmation prompt is approval theater. After 50 rapid-fire "approve / approve / approve" clicks for a Tier-0 file read, the human is reflex-clicking by the time the agent reaches a Tier-4 action — and in user studies of human-in-the-loop systems, this approval fatigue is the dominant failure mode, not malicious bypass. The fix is a tier-aware gate matrix:
- Tier 0–1: Allowlist by default, no prompt. The agent reads files and runs tests at full speed because the cost of asking is higher than the cost of being wrong.
- Tier 2: Lightweight gate — a summary diff in the chat, agent proceeds unless the human objects within a short window. The agent stays in flow.
- Tier 3: Explicit approval, but the artifact under review is the PR itself, which is already a familiar review surface. Reuse existing tooling instead of inventing a new modal.
- Tier 4: Hard gate. Human must affirmatively click and the agent must show a structured plan with the affected files, the rollback command, and the inferred risk class. This is also where the platform should require a justification string that names which user request the action is fulfilling — useful both for audit and for catching prompt-injection-induced drift.
- Tier 5–6: Out-of-band approval. The agent does not push the button. It stages the action, opens a change request in the existing change-management system, and waits for a human-initiated trigger. The agent never holds the production credential at the moment of execution.
Notice what this is not. It is not "more approvals everywhere." Tier 0 traffic should accelerate, not slow down. The gate budget is a finite human-attention resource and the only way to spend it well is to spend it where the cost of being wrong is large. Claude Code's distinction between a built-in safe-tool allowlist (file reads, search, code navigation) and a transcript-classifier gate for shell commands and external tool calls is one production instance of this idea — though even that two-tier split is coarser than what most teams need once an agent is doing real merges.
Rollback Velocity Is a Tier Property
- https://www.the-main-thread.com/p/ai-coding-tools-2026-java-developers-agents-control
- https://venturebeat.com/security/ai-agent-zero-trust-architecture-audit-credential-isolation-anthropic-nvidia-nemoclaw
- https://medium.com/@deudney/blast-radius-the-most-important-decision-you-make-before-you-build-9d21daef67f1
- https://huggingface.co/blog/Svngoku/agentic-coding-trends-2026
- https://lushbinary.com/blog/ai-agent-security-autonomous-coding-production-guide/
- https://medium.com/ai-plugged/security-by-design-agents-0ffe61a2700e
- https://fazm.ai/blog/limit-blast-radius-compromised-ai-agent
- https://heyvaldemar.com/ai-agent-blast-radius-assessment/
- https://www.securityweek.com/claude-code-gemini-cli-github-copilot-agents-vulnerable-to-prompt-injection-via-comments/
- https://venturebeat.com/security/ai-agent-runtime-security-system-card-audit-comment-and-control-2026
- https://developer.nvidia.com/blog/mitigating-indirect-agents-md-injection-attacks-in-agentic-environments/
- https://thehackernews.com/2026/04/google-patches-antigravity-ide-flaw.html
- https://botmonster.com/posts/ai-coding-agent-insider-threat-prompt-injection-mcp-exploits/
- https://owasp.org/www-community/attacks/PromptInjection
- https://www.promptfoo.dev/docs/red-team/agents/
- https://embracethered.com/blog/posts/2025/cross-agent-privilege-escalation-agents-that-free-each-other/
- https://www.arunbaby.com/ai-security/0001-agent-privilege-escalation-kill-chain/
- https://coder.com/blog/launch-dec-2025-agent-boundaries
- https://github.com/microsoft/agent-governance-toolkit
- https://platform.claude.com/docs/en/agent-sdk/permissions
- https://www.anthropic.com/engineering/claude-code-auto-mode
- https://www.armalo.ai/blog/ai-agent-kill-switch-6-ways
- https://law.stanford.edu/2026/03/07/kill-switches-dont-work-if-the-agent-writes-the-policy-the-berkeley-agentic-ai-profile-through-the-ailccp-lens/
- https://www.pedowitzgroup.com/ai-agent-kill-switches-practical-safeguards-that-work
- https://aipatternbook.com/rollback
- https://machinelearningmastery.com/building-a-human-in-the-loop-approval-gate-for-autonomous-agents/
- https://www.cio.com/article/4129620/agentic-ai-fails-without-an-architecture-of-flow-to-eliminate-the-friction-tax
