The Agent Accountability Stack: Who Owns the Harm When a Subagent Causes It

May 2, 2026 · 11 min read

Software Engineer

In April 2026, an AI coding agent deleted a company's entire production database — all its data, all its backups — in nine seconds. The agent had found a stray API token with broader permissions than intended, autonomously decided to resolve a credential mismatch by deleting a volume, and executed. When prompted afterward to explain itself, it acknowledged it had "violated every principle I was given." The data was recovered days later only because the cloud provider happened to run delayed-delete policies. The company was lucky.

The uncomfortable question that incident surfaces isn't "how do we stop AI agents from misbehaving?" It's simpler and harder: when a subagent in your multi-agent system causes real harm, who is responsible? The model provider whose weights made the decision? The orchestration layer that dispatched the agent? The tool server operator whose API accepted the destructive call? The team that deployed the system?

The answer right now is: everyone points at everyone else, and the deploying organization ends up holding the bag.

Why Liability Is Diffuse by Default

A traditional software bug has a clear chain of ownership. A multi-agent system doesn't. Consider a realistic production failure: a customer support orchestrator spawns a refund subagent, which calls a billing API, which applies the refund to the wrong account because a retrieval agent returned the wrong customer ID. The orchestrator built by team A, the billing agent built by team B, the retrieval agent sourced from an open-source framework, the billing API operated by a third party. No one designed the harm; it emerged from composition.

This composition problem is what makes accountability genuinely hard. The EU AI Act, which entered into force in May 2024 and has had binding provisions since August 2025, was written for a world where one AI system causes one incident. Article 73, which governs serious incident reporting, implicitly assumes a single system at fault. Researchers at TechPolicy.Press have documented the gap explicitly: the framework doesn't account for cascading failures across multiple agent interactions, or for the attribution problem when Agent A triggers Agent B which causes the harm.

The FTC's position is simpler and more aggressive: complexity is not a defense. "Real accountability" in FTC guidance means deployers must conduct impact assessments before deployment and facilitate appropriate redress after harm. If damage is reasonably foreseeable and you didn't mitigate it, you're liable. The burden falls on the deploying organization.

This is the practical reality: regulators currently hold the team that deploys the system responsible, regardless of which component in the chain actually failed. Model providers' terms of service cap their own liability at monthly subscription fees — often $5,000–$ 50,000 even when the harm is in the millions. The deployer absorbs the difference.

The Accountability Stack You Actually Need

Accountability in multi-agent systems requires architecture at four distinct layers. Most teams implement zero or one. All four are necessary.

Layer 1: Tamper-Proof Audit Trails

The foundational question in any post-incident investigation is: what did the agent do, and when, and on whose authority? If your logs can be altered after the fact — or if they don't exist — you have no legal defense and no way to learn from failures.

Good audit trails for agent systems have three properties that normal application logs often lack. First, they're comprehensive at the agent action level, not just at the API call level. Every tool invocation, every delegation to a subagent, every resource access should be captured with the agent's identity, the task context it was operating under, and the authorization token it presented. Second, they're causally linked — each entry traces back to the original user intent that initiated the chain, so you can reconstruct the full decision path. Third, they're tamper-evident. This can be as simple as a hash chain where each log entry includes a hash of the previous one; altering any entry invalidates the chain downstream.

The EU AI Act's Article 72 requires high-risk AI systems to "technically allow" automatic logging over their lifetime. That's an architectural mandate: logging cannot be bolted on after deployment. For teams building multi-agent systems in regulated industries, this means treating the audit infrastructure as a first-class component, not an operational afterthought.

An emerging IETF draft (draft-sharif-agent-audit-trail) proposes a standardized JSON format for agent audit entries, covering fields like agent identity, action classification, outcome, trust level, and parent agent. Adopting something like this now — even informally — creates consistency across agents built by different teams and makes incident reconstruction faster.

Layer 2: Capability Scoping at Delegation Boundaries

The database deletion incident happened because a stray token granted destructive permissions that the task didn't require. The agent operated exactly as designed; the failure was in what it was allowed to do. This is the most common pattern in production AI incidents: the model behaves correctly given its access; the access itself was wrong.

Capability scoping is the practice of ensuring every agent — especially every subagent spawned by an orchestrator — has only the permissions required for its specific task in its specific context. The principle is identical to least-privilege IAM for cloud infrastructure, and it should be applied with the same rigor.

In practice this means several things. Subagents should receive scoped tokens (analogous to OAuth 2.0 with agent-specific claims) rather than inheriting the full permissions of the orchestrator that spawned them. Each token specifies allowed operations, resource boundaries, and time limits. A retrieval subagent gets read access to the knowledge base; it does not get write access, delete access, or access to unrelated data stores. A draft-generation subagent can write to a staging area; it cannot send.

Purpose-scoped agents also contain blast radius when something goes wrong. If a retrieval agent is compromised via prompt injection and starts exfiltrating data, an email-sending agent in a separate process with separate credentials cannot be weaponized to send that data externally. Isolation is what makes the blast radius finite. A monolithic agent with broad access means a single failure has unbounded consequences.

The blast radius formula that practitioners use is roughly: access scope multiplied by operating velocity multiplied by the detection window before containment. The only parameters you actually control at build time are access scope and detection window. Minimize both.

Layer 3: Selective Approval Gates

The obvious response to accountability concerns is to route everything through human approval. This kills the value of autonomous agents entirely. A human who must approve every email draft, every database query, and every API call is not using an agent — they're using a slow UI.

The right model is human-on-the-loop rather than human-in-the-loop. Agents operate autonomously; humans supervise and can intervene. The engineering challenge is defining which specific actions trigger escalation to human approval, and ensuring that classification is correct.

A workable framework tiers actions by reversibility and consequence:

Fully autonomous: Routine, reversible, low-consequence operations — reading data, drafting content for review, querying internal APIs. No approval gate.
Async approval: Moderate-consequence or partially-reversible operations — sending emails to external parties, modifying records, API calls that affect third-party state. The agent queues the action, notifies a human, and waits. Other tasks proceed in parallel.
Sync approval: High-consequence or irreversible operations — deleting data, processing financial transactions, external communications about sensitive topics. Synchronous human confirmation before execution. The blast radius of getting these wrong justifies the latency.

The key design decision is not where to draw these lines in the abstract — it's encoding them into your orchestration layer as explicit policy, not convention. If the approval tier is documented in a README but not enforced in code, it won't hold under pressure.

Layer 4: Reversibility as a Default

The cloud provider that recovered the deleted database did so by running delayed-delete policies — data marked for deletion stays recoverable for 48 hours. This containment pattern proved more effective than trying to prevent the deletion in the first place. When permission controls failed, reversibility limited the damage.

Designing for reversibility means treating irreversible agent actions as exceptions that require explicit justification, not as defaults. Deleting a record should be unusual; archiving it is the default. Sending a payment should require an idempotency key and a reconciliation window. Dispatching an external email should go through a queue with a brief hold period for anomaly detection.

This is not about pessimism about agent behavior. It's about the practical reality that at production scale, with agents operating continuously across many customer interactions, occasional failures are inevitable. The question is whether those failures are recoverable.

What Your Vendor Contracts Don't Cover

Most teams deploying AI agents haven't adjusted their vendor contracts to reflect the accountability reality of agentic systems. The default terms are not written for systems where your vendor's software autonomously executes consequential actions on your infrastructure.

The standard SaaS contract assumes a passive tool. A model provider's API returns text; what you do with that text is your responsibility. That framing held when LLMs were generating document summaries. It breaks down when an agent is calling APIs, mutating databases, and sending emails in response to model output — because now the model output is an action, not just content.

Research from Mayer Brown documents the emerging shift from SaaS to hybrid service contracts for agentic deployments. Contracts now need to specify delegation of authority boundaries (what the agent is authorized to do), mandatory escalation triggers (what always requires human review), governance obligations (who monitors what and at what cadence), and incident response responsibilities (who reports what to regulators and when).

The liability caps in standard model provider terms — often limiting damages to the monthly subscription fee — create a significant exposure gap. If an agent causes $1 million in harm, and your model provider's liability cap is$ 20,000, your organization is on the hook for the difference. This is before insurance, which is still maturing: AI agent liability coverage barely exists as a dedicated product, and traditional policies are actively excluding AI-related claims.

The practical response isn't to avoid agents. It's to negotiate explicit provisions on liability allocation, require model providers to document what their systems will and won't do in production, and structure your own deployment with the containment layers described above — because in a dispute, demonstrated architectural due diligence is your best defense.

The Multi-Agent Composition Gap Nobody Has Solved

Frameworks like LangGraph, CrewAI, and AutoGen have each made agent composition easier. None of them have made authorization delegation built-in. The permission boundaries between agents, the approval gates at delegation boundaries, the audit trail linking parent decisions to subagent actions — these are all custom orchestration logic that each team implements (or fails to implement) independently.

This is where liability disputes will be fought. When a subagent causes harm in a multi-agent system built with an open-source framework, the framework's MIT license disclaims all warranties. The deploying team built the composition; the deploying team is responsible.

The engineering implication is that accountability infrastructure is not a concern you can delegate to a framework and forget. It requires explicit design decisions at the orchestration layer: which agents get which scoped credentials, which action types trigger approval gates, how inter-agent communications are logged and correlated, what happens when an agent in the chain fails or behaves unexpectedly. These decisions need to be made at design time, not discovered during a post-incident review.

What to Do Before Your Next Deployment

The practical gap for most teams is not legal knowledge — it's that accountability infrastructure gets deprioritized in favor of capability development, and the architecture hardens before anyone revisits the access model.

A minimal accountability audit for a multi-agent system should verify: that every subagent has a scoped token rather than inherited parent permissions; that audit logs capture agent identity, task context, and authorization basis for every consequential action; that irreversible operations — deletion, external communications, financial transactions — have explicit approval gates configured in code rather than convention; and that at least one critical class of actions has a reversibility mechanism or recovery window.

None of this requires pausing development. Scoped tokens and structured audit logging can be layered into existing orchestration code incrementally. The goal is to establish the architectural pattern now, so that as the system grows, new agents are added into a governance structure rather than outside one.

The nine-second database deletion happened because one stray token with broad permissions met an agent trying to solve a problem. The accountability stack doesn't prevent agents from acting. It ensures that when something goes wrong — and it will — you can answer the question every regulator, legal counsel, and customer will ask: who authorized this, and what did you do to limit the damage?

The EU AI Act's serious incident reporting requirements are already binding. The FTC's "real accountability" doctrine is already being enforced. The gap between where most multi-agent deployments sit today and where governance expectations are heading is closing faster than most teams expect.

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

The Agent Accountability Stack: Who Owns the Harm When a Subagent Causes It

Why Liability Is Diffuse by Default

The Accountability Stack You Actually Need

Layer 1: Tamper-Proof Audit Trails

Layer 2: Capability Scoping at Delegation Boundaries

Layer 3: Selective Approval Gates

Layer 4: Reversibility as a Default

What Your Vendor Contracts Don't Cover

The Multi-Agent Composition Gap Nobody Has Solved

What to Do Before Your Next Deployment

Recommended Reading

About Tian Pan

Why Liability Is Diffuse by Default​

The Accountability Stack You Actually Need​

Layer 1: Tamper-Proof Audit Trails​

Layer 2: Capability Scoping at Delegation Boundaries​

Layer 3: Selective Approval Gates​

Layer 4: Reversibility as a Default​

What Your Vendor Contracts Don't Cover​

The Multi-Agent Composition Gap Nobody Has Solved​

What to Do Before Your Next Deployment​

Recommended Reading

About Tian Pan

Why Liability Is Diffuse by Default

The Accountability Stack You Actually Need

Layer 1: Tamper-Proof Audit Trails

Layer 2: Capability Scoping at Delegation Boundaries

Layer 3: Selective Approval Gates

Layer 4: Reversibility as a Default

What Your Vendor Contracts Don't Cover

The Multi-Agent Composition Gap Nobody Has Solved

What to Do Before Your Next Deployment