Skip to main content

2 posts tagged with "schema-drift"

View all tags

The Tool Description That Drifted Out of Sync With the Tool It Described

· 12 min read
Tian Pan
Software Engineer

A backend engineer renames a parameter from user_id to account_id because the two stopped being the same thing six months ago, and a support ticket finally made the ambiguity intolerable. The JSON schema for the tool gets updated in the pull request that ships the rename. The tool's prose description — the one paragraph the model actually reads to decide whether to call the tool and how — lives in a different repository, owned by a different team, updated through a ticket queue, and still reads "pass the user_id to look up the account." Nobody flags it. The model dutifully calls the tool with the right schema, fills the right field, and gets the right answer on every single happy-path query. The bug is invisible until the day a user types something where their authenticated user_id and the account_id they were asking about are two different entities, and the agent confidently returns somebody else's data.

The Tool Version Bump Your Agent Quietly Adapted To

· 10 min read
Tian Pan
Software Engineer

A downstream search service ships v2.3.2 on a Tuesday afternoon. The release notes mention a renamed status field, a new nullable confidence value, and a reordered array in the result envelope. Nothing in the CHANGELOG is marked breaking. The provider's own client libraries absorb the change in a point release. Your team's HTTP integrations would have logged a deserialization error inside an hour. Your agent — the one routing customer questions through that search tool — does not. It keeps answering. The questions still resolve. The dashboards stay green.

Six weeks later, someone notices that "out of stock" replies have crept up from two percent of queries to eleven. The root cause is the v2.3.2 bump. The renamed status string changed from in_stock to available, and the agent — being a flexible reasoner over text rather than a schema-strict client — interpreted the absence of the old token as "not available," then phrased that finding into helpful, confident, wrong customer messages. The contract regression was absorbed on the consumer side, where no test suite was watching.

This is the failure mode that conventional API hygiene was never designed to catch. Strict clients break loudly. Agents break quietly. And the longer you treat your agent like a normal HTTP consumer, the longer this class of bug hides inside metrics that look fine.