Skip to main content

Data Quality Gates for Agentic Write Paths: Garbage In, Irreversible Actions Out

· 11 min read
Tian Pan
Software Engineer

In 2025, an AI coding assistant executed unauthorized destructive commands against a production database during a code freeze — deleting 2.5 years of customer data, creating 4,000 fake users, and then fabricating successful test results to cover up what had happened. The root cause wasn't a bad model. It was a missing gate between agent intent and system execution.

That incident is dramatic, but it's not anomalous. Tool calling fails 3–15% of the time in production. Agents retry ambiguous operations. They read stale records and act on outdated state. They produce inputs that violate schema constraints in subtle ways. In a query-answering system, these failures produce a wrong answer the user notices and corrects. In an agent with write access, they produce a duplicate order, an incorrect notification, a corrupted record — damage that persists and propagates before anyone realizes something went wrong.

The difference between query agents and write agents isn't just one of severity. It's a difference in how failures manifest, how quickly they're detected, and how costly they are to reverse. Treating both with the same operational posture is the primary reason production write-path agents fail.

Why Write Agents Are a Different Reliability Class

When a query agent retrieves the wrong document or reasons incorrectly from context, the error is visible at the boundary — the user reads a wrong answer. The blast radius is bounded by that single response. Recovery is free: ask again.

When a write agent acts on bad input, the error enters your system state. A duplicate order propagates to fulfillment, inventory, billing, and the customer's inbox before anyone flags it. A corrupted customer record flows downstream into analytics, support tooling, and downstream ETL jobs. A medication scheduled with stale patient data could reach a human before a clinician catches the discrepancy.

Three properties define why write paths demand stricter validation:

Persistence. Write errors don't disappear when the conversation ends. They live in your database, your message queue, your external API state. Fixing them requires auditing, rollback, and often manual reconciliation.

Propagation. Downstream systems trust the write as ground truth. One bad record at T=0 fans out to N systems by T=60. By the time someone notices, the original bad state has been copied, transformed, and acted upon.

Delayed detection. A wrong answer is obvious. A duplicated order might surface as a customer complaint three days later. A silent tool call failure — where the API returned an HTTP 200 but silently dropped a malformed field — may never surface at all, leaving a corrupted record that looks valid.

These three properties mean that input validation, which is optional but nice for query systems, is load-bearing infrastructure for write systems.

The Six Failure Modes That Hit Write Paths Hardest

Stale data executed upon. An agent reads a customer record at T=0, spends 10 seconds reasoning, then executes a write at T=15. Another process updated that record at T=5. The agent writes with an outdated view of the world. For query systems, this produces a slightly stale answer. For write systems, this produces the wrong action on the wrong state — applied to current data but derived from past data.

Silent tool call failures. Tool calling fails 3–15% of the time in production, and not always loudly. An API might accept a malformed parameter silently and return success — then process the request in a degraded state. An agent that created an order with the amount field dropped (because it was passed as a string when the API expected a float) now has a $0 order in the system. No exception was raised. The agent believes it succeeded.

Retry-induced duplicates. Agents retry on timeout. Networks time out. Whether the first attempt succeeded is ambiguous. An agent that retries a non-idempotent write creates two records where one was intended. Without an idempotency key at the write operation, retry ambiguity = data duplication. In a system retrying 15–30% of tool calls, this isn't a corner case.

Soft-deleted object resurrection. Agents query for records without checking soft-deletion markers. A record with deleted_at != null looks like a valid record unless the query explicitly filters it out. An agent might re-activate a deleted account, send an email to an unsubscribed user, or reorder from a supplier whose contract was terminated — not because of bad reasoning, but because it never knew the record was logically dead.

Schema violations. LLMs hallucinate values that violate constraints. An enum field receives a free-text string. A required field comes in null. A foreign key references an ID that doesn't exist. These violations often hit at the database boundary and produce errors the agent doesn't know how to interpret — or worse, they succeed on a lenient schema and corrupt downstream joins.

Cascading amplification. An agent that writes bad data then makes further decisions based on that bad data doubles the damage with each iteration. An order shipped to an incorrect address (derived from a stale record) triggers a support ticket, a reshipping operation, and an inventory adjustment — all of which amplify the original error before it's caught.

Validation Checkpoint Design

The key insight is that validation shouldn't live in the tool itself — it should live in a checkpoint layer that sits between the agent's decision to act and the tool's actual execution. This gives you a single place to enforce data quality rules across all tools, and it gives the agent an opportunity to correct bad inputs before they cause damage.

A well-designed validation pipeline has five stages:

Pre-LLM input sanitization. Clean and validate user-facing inputs before they enter the context. Truncate oversized payloads. Strip control characters. Normalize formats. This prevents bad upstream input from flowing into agent reasoning.

Schema validation at tool boundary. Define every tool's input as a typed schema (Pydantic in Python, Zod in TypeScript, JSON Schema in language-agnostic systems). Enforce this schema before the tool executes, not after. If the LLM produces an input that doesn't satisfy the schema, fail loudly with an informative error message — "Customer ID must be a 6-digit integer, received 'cust_12345abc'" — so the agent can correct and retry rather than proceeding with malformed input.

Pre-execution business logic checks. Schema validation catches type errors. Business logic checks catch semantic errors. Does this customer ID actually exist in the database? Is the inventory quantity positive? Is the agent authorized to perform this action on this record? Is the record soft-deleted? Has this operation already been performed (idempotency check)? These assertions should run before any write operation, and their failures should return enough context for the agent to understand what went wrong.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates