Skip to main content

Stale Tool Descriptions Are Your Agent's Biggest Silent Failure

· 9 min read
Tian Pan
Software Engineer

You ship a tool that lets your agent fetch user profiles. The description reads: "Retrieves user information by user ID." Six weeks later, the backend team renames user_id to customer_uuid and adds a required tenant_id field. Nobody updates the tool schema. Your agent keeps calling the old signature, gets back a 400, interprets the empty result as "no user found," and helpfully creates a duplicate record.

No error in the logs. No alert fired. The agent was confident the whole time.

This is the tool documentation problem: schema drift that turns stale descriptions into silent failure vectors. It is probably the most underappreciated reliability hazard in production AI systems today, and it gets worse the longer your agent lives.

Why Tool Descriptions Are Not Just Documentation

Most engineers treat tool descriptions like comments — optional, helpful if present, but not load-bearing. This is a mistake.

When an LLM invokes a tool, it has exactly two signals to work from: the schema (parameter names and types) and the description (natural language explaining what the tool does and how to call it). The model has no access to the implementation. It cannot check whether user_id was recently renamed. It cannot know that a new required field was added last Tuesday. It is reasoning entirely from the schema you gave it.

This means tool descriptions are API contracts in the precise sense: they are the specification the model uses to form all of its invocation decisions. A 2025 analysis found 97% of tool descriptions contain at least one quality issue — vague purpose statements, missing parameter formats, ambiguous enum values. When those descriptions also drift from what the implementation actually expects, you get a double failure: the description was never precise enough to begin with, and now it's wrong.

The danger isn't that the agent throws an error. The danger is that the agent doesn't. It continues operating, returning plausible results, and the problem only surfaces when a human finally notices something is off — often weeks later, after the damage compounds.

How Schema Drift Happens in Practice

The mechanics are mundane, which is partly why the problem goes unaddressed.

A backend engineer adds a required field to improve query correctness. The field is documented in the API changelog. Nobody has tool description updates on their checklist. The agent SDK is deployed separately. The description in the model's prompt still says the old thing.

Or a deprecation cycle plays out: the old parameter name continues to work for a while, so integration tests pass. By the time the old name stops being accepted, the agent's failure mode has shifted from "incorrect results" to "hard errors" — which at least is visible, but the silent period before that shift did real damage.

A related failure is field renaming that preserves semantics but changes format: user_id (integer) becomes customer_uuid (UUID string). The agent still sends a value for something called a user identifier. But it sends an integer. The backend returns results for the wrong record, or no results at all, and the agent reasons around the empty response rather than flagging it as a schema error.

The invisibility is the point. Unlike a missing import or a bad network call, schema drift succeeds loudly enough that the agent doesn't stop — it just proceeds on false premises with high confidence.

The Documentation-as-Contract Discipline

The fix is not to write better descriptions. It is to treat description changes as breaking API changes — full stop.

This means several concrete things for your engineering process:

Description changes go through the same review gates as interface changes. A PR that modifies "Fetches user profile by ID" to "Returns legacy customer record" requires the same scrutiny as a signature modification. The change alters the semantic contract the model uses to decide when and how to call the tool.

Versioning is not optional. When you need to change a tool's behavior or schema in a way that could break existing callers (including your agent), the correct move is get_user_v2 alongside get_user_v1. The old version stays alive through the rollout period. Agents cached against the old schema keep working while you migrate.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates