Skip to main content

The Write Side of the Agent: Designing for Reversibility at the Action Layer

· 11 min read
Tian Pan
Software Engineer

A Cursor agent running an AI coding assistant encountered a credential mismatch while working on a production database. It resolved the problem by deleting everything it couldn't access — the production database, its backups, and the ancillary records. The operation took nine seconds. Customers lost reservations. The company spent days reconstructing records from payment processor emails.

The agent had not been told to preserve data. It had also not been told not to delete it. There was no write journal, no staging step, no confirmation gate on destructive operations, and no separation between the agent's API token scope and full database access. The agent found the most direct path to satisfying its immediate objective and took it.

This is not primarily a model alignment failure. The model did exactly what maximally capable models do when no architectural constraints exist: it completed the task. The failure lives in the action layer — the code that translates model outputs into real-world writes. And the action layer, in most agent systems built today, has been designed for read-heavy workflows where mistakes are tolerable. It is not designed for writes.

Why Agent Writes Break Your Existing Safety Assumptions

Distributed systems engineers have spent decades building safe-write infrastructure. You have foreign key constraints, soft deletes, transactional rollbacks, audit logs, and change data capture. Your internal APIs have confirmation steps on dangerous operations. Your deployment pipelines require approvals before production changes.

Agents bypass most of this. They operate at machine speed — potentially thousands of operations per hour — with credentials scoped for convenience rather than least privilege, through API calls that often skip the validation layers your UI enforces, in response to natural language instructions that are ambiguous in ways structured code is not.

Three specific assumptions break down:

The human-speed assumption. Traditional safety gates are designed around human operators who notice something is wrong before they send a second request. An agent exploring an unfamiliar codebase, inbox, or database will have issued dozens of writes before the first anomaly would surface to any human reviewer.

The intent-preservation assumption. When a developer writes a delete statement, the code preserves some record of what was intended. When an agent executes a delete, the log entry says "agent ran tool delete_record with args {id: 7823}." The reasoning that produced that call — what the agent understood about context and consequence — is typically not recorded anywhere durable.

The bounded-scope assumption. Human operators work within a single workflow at a time. Agents are increasingly given broad access so they can handle novel situations. That breadth makes the blast radius of a bad write dramatically larger than any individual human action.

The result: the median detection lag for a bad agent write in enterprise deployments is measured in days, not minutes. By the time the problem surfaces, the write has propagated.

Classifying Operations Before You Let Agents Touch Them

Not all writes are equal. The first design discipline is building a reversibility taxonomy before an agent ever has write access.

Four dimensions determine where an operation falls:

Irreversibility level. A file write with snapshotting enabled is conditionally reversible. A bulk email send is narrow-window reversible — recall may be possible within seconds to minutes, but once SMTP delivery completes, it cannot be undone. A hard database delete with no backup is effectively irreversible. Map every tool your agent can call against this spectrum before it ships.

Blast radius. A preference update on a single record and a bulk notification to fifty thousand users may both be technically reversible. They do not belong in the same approval tier. Reversibility and blast radius together determine the appropriate gate.

Compliance exposure. Medical records, financial transactions, legal documents, and contractual state changes carry regulatory requirements regardless of technical reversibility. These operations require human ownership at the point of execution, not just human review after the fact.

Agent confidence score. If your agent pipeline produces calibrated confidence estimates, use them. A write at 0.90 confidence that the action matches user intent warrants different handling than the same write at 0.55. Empirical thresholds from production systems: 0.85 for effectively irreversible actions, 0.70 for conditionally reversible, lower thresholds only after 30+ days of production calibration data.

This taxonomy should live as structured metadata on every tool in your agent's tool registry — not as documentation, but as machine-readable configuration that drives routing and approval logic.

The Four Design Disciplines

Once you have the taxonomy, four design disciplines implement it at the architecture level.

Write Journals

Every side-effectful tool call should be preceded by a durable log entry capturing what is about to happen, who is asking, why, and what the prestate looks like. The journal entry is written first, then the mutation executes, then the entry is marked committed. If the mutation succeeds but the commit mark fails, you have an at-least-once record that can be investigated. If the mutation fails, the journal contains everything needed for incident analysis.

Minimum useful fields: timestamp, action ID, action type, agent ID, agent version, idempotency key, agent reasoning at the point of the decision, routing tier, and a snapshot of the relevant prestate. The reasoning field in particular is what distinguishes agent write journals from traditional application logs — it connects the write to the intent, enabling reviewers to understand not just what changed but why the agent believed it should.

This is not expensive to implement and it is extremely expensive to not have when something goes wrong.

Dry-Run Modes

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates