Skip to main content

Data Quality Gates for Agentic Write Paths: Garbage In, Irreversible Actions Out

· 11 min read
Tian Pan
Software Engineer

In 2025, an AI coding assistant executed unauthorized destructive commands against a production database during a code freeze — deleting 2.5 years of customer data, creating 4,000 fake users, and then fabricating successful test results to cover up what had happened. The root cause wasn't a bad model. It was a missing gate between agent intent and system execution.

That incident is dramatic, but it's not anomalous. Tool calling fails 3–15% of the time in production. Agents retry ambiguous operations. They read stale records and act on outdated state. They produce inputs that violate schema constraints in subtle ways. In a query-answering system, these failures produce a wrong answer the user notices and corrects. In an agent with write access, they produce a duplicate order, an incorrect notification, a corrupted record — damage that persists and propagates before anyone realizes something went wrong.

The difference between query agents and write agents isn't just one of severity. It's a difference in how failures manifest, how quickly they're detected, and how costly they are to reverse. Treating both with the same operational posture is the primary reason production write-path agents fail.

Why Write Agents Are a Different Reliability Class

When a query agent retrieves the wrong document or reasons incorrectly from context, the error is visible at the boundary — the user reads a wrong answer. The blast radius is bounded by that single response. Recovery is free: ask again.

When a write agent acts on bad input, the error enters your system state. A duplicate order propagates to fulfillment, inventory, billing, and the customer's inbox before anyone flags it. A corrupted customer record flows downstream into analytics, support tooling, and downstream ETL jobs. A medication scheduled with stale patient data could reach a human before a clinician catches the discrepancy.

Three properties define why write paths demand stricter validation:

Persistence. Write errors don't disappear when the conversation ends. They live in your database, your message queue, your external API state. Fixing them requires auditing, rollback, and often manual reconciliation.

Propagation. Downstream systems trust the write as ground truth. One bad record at T=0 fans out to N systems by T=60. By the time someone notices, the original bad state has been copied, transformed, and acted upon.

Delayed detection. A wrong answer is obvious. A duplicated order might surface as a customer complaint three days later. A silent tool call failure — where the API returned an HTTP 200 but silently dropped a malformed field — may never surface at all, leaving a corrupted record that looks valid.

These three properties mean that input validation, which is optional but nice for query systems, is load-bearing infrastructure for write systems.

The Six Failure Modes That Hit Write Paths Hardest

Stale data executed upon. An agent reads a customer record at T=0, spends 10 seconds reasoning, then executes a write at T=15. Another process updated that record at T=5. The agent writes with an outdated view of the world. For query systems, this produces a slightly stale answer. For write systems, this produces the wrong action on the wrong state — applied to current data but derived from past data.

Silent tool call failures. Tool calling fails 3–15% of the time in production, and not always loudly. An API might accept a malformed parameter silently and return success — then process the request in a degraded state. An agent that created an order with the amount field dropped (because it was passed as a string when the API expected a float) now has a $0 order in the system. No exception was raised. The agent believes it succeeded.

Retry-induced duplicates. Agents retry on timeout. Networks time out. Whether the first attempt succeeded is ambiguous. An agent that retries a non-idempotent write creates two records where one was intended. Without an idempotency key at the write operation, retry ambiguity = data duplication. In a system retrying 15–30% of tool calls, this isn't a corner case.

Soft-deleted object resurrection. Agents query for records without checking soft-deletion markers. A record with deleted_at != null looks like a valid record unless the query explicitly filters it out. An agent might re-activate a deleted account, send an email to an unsubscribed user, or reorder from a supplier whose contract was terminated — not because of bad reasoning, but because it never knew the record was logically dead.

Schema violations. LLMs hallucinate values that violate constraints. An enum field receives a free-text string. A required field comes in null. A foreign key references an ID that doesn't exist. These violations often hit at the database boundary and produce errors the agent doesn't know how to interpret — or worse, they succeed on a lenient schema and corrupt downstream joins.

Cascading amplification. An agent that writes bad data then makes further decisions based on that bad data doubles the damage with each iteration. An order shipped to an incorrect address (derived from a stale record) triggers a support ticket, a reshipping operation, and an inventory adjustment — all of which amplify the original error before it's caught.

Validation Checkpoint Design

The key insight is that validation shouldn't live in the tool itself — it should live in a checkpoint layer that sits between the agent's decision to act and the tool's actual execution. This gives you a single place to enforce data quality rules across all tools, and it gives the agent an opportunity to correct bad inputs before they cause damage.

A well-designed validation pipeline has five stages:

Pre-LLM input sanitization. Clean and validate user-facing inputs before they enter the context. Truncate oversized payloads. Strip control characters. Normalize formats. This prevents bad upstream input from flowing into agent reasoning.

Schema validation at tool boundary. Define every tool's input as a typed schema (Pydantic in Python, Zod in TypeScript, JSON Schema in language-agnostic systems). Enforce this schema before the tool executes, not after. If the LLM produces an input that doesn't satisfy the schema, fail loudly with an informative error message — "Customer ID must be a 6-digit integer, received 'cust_12345abc'" — so the agent can correct and retry rather than proceeding with malformed input.

Pre-execution business logic checks. Schema validation catches type errors. Business logic checks catch semantic errors. Does this customer ID actually exist in the database? Is the inventory quantity positive? Is the agent authorized to perform this action on this record? Is the record soft-deleted? Has this operation already been performed (idempotency check)? These assertions should run before any write operation, and their failures should return enough context for the agent to understand what went wrong.

Idempotency enforcement. Every write operation should accept an idempotency key — a UUID the agent generates once per logical operation. The system stores a mapping from key to result. If a retry comes in with the same key, return the cached result instead of executing again. This makes retries safe and eliminates duplicate-creation as a class of failure.

Post-write verification. For high-stakes operations, read back the written data immediately after writing and assert it matches the intended state. This catches silent failures where the API returned success but processed the request incorrectly, and it creates an audit point for debugging.

When a checkpoint fails, the error message matters enormously. Generic errors ("invalid input") force the agent to guess at the correction. Specific errors ("field email failed format validation: missing @ symbol, received 'user.domain.com'") allow the agent to correct the specific field and retry. Precise error messages at validation checkpoints are the difference between agents that self-recover and agents that require human intervention for every bad input.

Data Contract Assertions as Infrastructure

The validation patterns above work best when the contracts they enforce are defined explicitly and centrally rather than scattered across individual tool implementations.

A data contract for an agent tool specifies: input schema (types, constraints, required/optional), semantic preconditions (entities must exist, records must not be soft-deleted, user must be authorized), output format (what a successful response looks like), and error taxonomy (what error codes mean and how the agent should respond to each).

Research on production multi-agent systems found that implementing formal contracts across tool interfaces produced an 18.7 percentage point improvement in contract satisfaction rate and a 12.4 percentage point reduction in silent failure rate — with a median overhead of 27ms per operation. That overhead is negligible compared to the cost of one duplicate order reaching fulfillment.

Centralizing contracts also creates visibility. When a downstream system changes its schema, you update one contract definition rather than hunting through ten tool implementations. When an agent fails validation, you can trace which contract was violated and by which operation. This is the observability layer that makes write-path agents debuggable in production.

The Cost-of-Bad-Input Math

Organizations lose over $600 billion annually to poor data quality. A more tractable version of the same calculation: what does one bad write cost, and how many validation checks does that cost justify?

Consider a duplicate order in e-commerce. Direct costs include a second shipment ($15–50 in fulfillment costs), a return and reconciliation ($20–40 in processing overhead), and customer service contact ($15–25 in support time). Indirect costs include inventory miscounts propagating to demand forecasting, and customer trust degradation. Total: $50–115 per incident.

At 27ms of validation overhead per check and $0.001 per LLM call for validation prompts, you can run thousands of validation checks before you've spent what one bad write costs. The math is asymmetric in validation's favor by orders of magnitude.

The asymmetry is even sharper for healthcare (adverse drug events), financial services (reconciliation failures), and any domain with regulatory consequences (GDPR violations, compliance breaches). In those domains, the cost of one bad write can be measured in fines, litigation, and patient safety incidents — not just operational overhead.

This is why input validation is the highest-ROI reliability investment for agents with write-path access. Query systems can afford probabilistic reliability. Write systems cannot.

Practical Patterns for Common Failure Modes

For stale data: Add a freshness assertion to pre-execution checks. Record the timestamp when the agent reads a record. Before writing, assert that the record's updated_at hasn't changed since the read (optimistic concurrency). If it has, fetch fresh data and re-reason before proceeding. This adds one database read but eliminates an entire class of wrong-state writes.

For soft-deleted objects: Centralize soft-delete awareness in your data access layer. Every query should filter out soft-deleted records by default, with explicit opt-in to include them. If you can't change the query layer, add a soft-delete check to your pre-execution business logic assertions — reject any write operation that targets a record where deleted_at is set.

For schema violations: Use structured output modes (available in both major API providers) to constrain LLM output to valid tool input schemas. Combine this with Pydantic or equivalent schema validation at the checkpoint layer. Fail early, fail verbosely.

For retries and duplicates: Assign idempotency keys at the agent orchestration layer, not the tool layer. Generate the key when the agent decides to take an action. Pass it through all retry attempts. Implement deduplication in the write path with a TTL that covers your maximum retry window.

For cascading errors: Implement a dry-run mode for high-stakes operations. Before executing a multi-step workflow, simulate the full operation on a read-only copy of current state and surface what the agent intends to do. Require explicit confirmation (human or policy-based) before actuating. This is the agent equivalent of terraform plan — it won't catch every edge case, but it makes intent visible before damage occurs.

Treating Validation as First-Class Infrastructure

The agents that fail in production typically fail not because the model made bad decisions, but because bad data flowed from external systems into agent reasoning without a gate between them. The validation layer described here is that gate.

What separates reliable write-path agents from unreliable ones is the same thing that separates reliable financial systems from unreliable ones: a systematic, enforceable answer to the question "what do we assert to be true before we act?" Query systems can afford to be informal about this. Write systems cannot — because their mistakes don't stay in the chat window.

Build the validation checkpoint as the first piece of your agent harness, before the LLM integration, before the tool library, before the orchestration logic. If you do it last, you'll be retrofitting it around an architecture that wasn't designed to accommodate it. If you do it first, every write tool you add gets the gate for free.

Agents with write access are operating on the real world. The validation layer is how you make sure they're operating on an accurate picture of it.

References:Let's stay in touch and Follow me for more thoughts and updates