Skip to main content

Schema Entropy: Why Your Tool Definitions Are Rotting in Production

· 10 min read
Tian Pan
Software Engineer

Your agent was working fine in January. By March, it started failing on 15% of tool calls. By May, it was silently producing wrong outputs on another 20%. Nothing in your deployment logs changed. No one touched the agent code. The tool definitions look exactly like they did six months ago — and that's the problem.

Tool schemas don't have to be edited to become wrong. The services they describe change underneath them. Enum values get added. Required fields become optional in a backend refactor. A parameter that used to accept strings now expects an ISO 8601 timestamp. The schema document stays frozen while the underlying API keeps moving, and your agent keeps calling it confidently, with no idea the contract has shifted.

This is schema entropy: the gradual divergence between the tool definitions your agent was trained to use and the tool behavior your production services actually exhibit. It is one of the most underappreciated reliability problems in production AI systems, and research suggests tool versioning issues account for roughly 60% of production agent failures.

What Schema Entropy Actually Looks Like

Schema entropy isn't a single failure mode — it's a category of failures with a shared root cause.

The most visible form is a hard break: you rename a required parameter, and agents immediately start generating calls that fail with 400 errors. These are actually the easy cases. You see the failure, you find the mismatch, you fix it.

The dangerous form is soft rot. Consider these scenarios:

  • You add a new enum value PENDING_REVIEW to a status field. Your agent's tool description still only lists the four values it knew about at launch. When the API starts returning PENDING_REVIEW, the agent tries to interpret it with its existing mental model — sometimes correctly by inference, sometimes not.
  • You make a parameter optional that used to be required. Calls that omit it now succeed at the API layer but trigger different behavior in the backend. Your agent doesn't know this branch exists.
  • You change a field from accepting a bare integer to requiring a string-formatted integer ("42" vs 42). The API silently coerces it for a while, then a framework upgrade stops the coercion. Agent calls start failing with cryptic type errors weeks after the backend change.
  • You add a more specific sibling tool (search_position_budgets) alongside an existing one (search_positions). Both look similar in your tool manifest. The agent starts mixing them up, routing 30% of budget queries to the wrong endpoint.

Research on agent tool testing found exactly this last pattern: improving tool descriptions to better distinguish similar tools had a larger impact on agent accuracy than improving the agent's own logic. The description was the bug.

Why Agents Make Bad Schema Consumers

When a human developer calls an API with a stale schema, the workflow includes reading error messages, consulting updated docs, and adjusting. Agents don't have that recovery loop by default. They receive a tool definition at the start of a run and treat it as ground truth for the duration.

More critically, agents fail in ways that are structurally different from code failures. A controlled study on schema-first tool APIs found it useful to distinguish three failure categories:

  1. Interface misuse: Structurally malformed calls — wrong types, missing required fields, hallucinated parameter names. These are the failures formal schemas prevent.
  2. Execution failures: Calls that are well-formed but trigger runtime preconditions the agent didn't know about.
  3. Semantic misuse: Schema-valid calls that are logically wrong for the task. The agent called the right tool with the right structure, just with semantically incorrect values.

Schema entropy primarily causes failure types 2 and 3. A tool definition can be syntactically correct and still point your agent toward the wrong behavior because the meaning of the schema has drifted from the behavior of the service.

This is compounded by a pernicious observability problem: many APIs return HTTP 200 even when an operation failed. The StackOne research on agent testing found that rate-limit errors buried in response bodies — returned as {"status": 200, "data": null} — were being interpreted by agents as "no records exist" rather than "request failed." When HTTP success doesn't mean business success, your agent has no signal that the schema has led it astray.

The Three Places Schema Rot Starts

Understanding where entropy enters the system tells you where to put your defenses.

External services. You don't control when a third-party API changes its schema. Payment processors add new charge states. CRMs evolve their contact models. Every external tool dependency is a potential schema rot vector, and you won't get a webhook when their API changes.

Internal services crossing team boundaries. A backend team refactors a service endpoint. They update their API documentation. They do not update the LLM tool definition their colleagues wrote six months ago in a different repo. This is the most common scenario in practice — tool definitions live near the agent code while the services they describe live elsewhere.

Model-side changes. Provider updates can change how models interpret schemas. Strict mode enforcement, changes in how optional fields are treated, behavioral differences in how models handle ambiguous enum values — these shift the effective contract even when your JSON schema bytes haven't changed.

Backward Compatibility Rules for Tool Schemas

The core principle is asymmetric: it is always safe to add, never safe to remove, and sometimes safe to change — depending on the direction.

Safe changes:

  • Adding new optional parameters with documented defaults
  • Adding new enum values if the agent can fail gracefully on unknown values
  • Adding entirely new tools to the manifest
  • Making previously required parameters optional (if the default behavior is correct for existing call patterns)

Breaking changes:

  • Removing or renaming any parameter
  • Making an optional parameter required
  • Changing the type of an existing parameter (even if technically compatible at the wire level)
  • Removing enum values that agents may currently be sending
  • Changing the semantic meaning of a parameter without changing its name
Loading…
References:Let's stay in touch and Follow me for more thoughts and updates