Tool Schema Deprecation: Why You Can't Just Rename a Parameter
You renamed query to search_query on a tool schema. The changelog says "non-breaking: clearer naming." The PR passed review. Three days later, your support queue fills up with reports that the assistant is "searching for blank results." What happened is not what anyone on the thread would tell you. The agents did not fail. They submitted the old field name, your tool server ignored the unknown key, defaulted search_query to the empty string, and returned zero hits. The model, seeing a legitimate-looking empty response, confidently explained to the user why their query returned nothing relevant.
This is the part of agent engineering that does not fit the mental model borrowed from REST API versioning. A REST client that sends a renamed field gets a 400 and a clear error — the field either exists in the validator or it doesn't. An agent that sends a renamed field gets a silent acceptance, a nonsense result, and a hallucinated rationalization. The failure is not at the wire; it is in the joint between the runtime schema and the model's in-context mental model of what the tool looks like.
Tool schemas live in two places. The first is the runtime spec — the JSON schema you publish to the MCP server or the function-calling registry. The second is the model's in-context representation of that spec, reinforced every turn by few-shot examples in your system prompt, by the serialized tool history the agent sees on multi-turn tasks, and by whatever the model already absorbed about your API during pretraining. You can atomically update the first. You cannot atomically update the second. That asymmetry is the whole problem, and it is why "additive only, reserve forever" — the discipline that protobuf and GraphQL operators internalized a decade ago — needs to migrate to the tool-schema layer now.
The two places your schema lives
When engineers first ship an agent to production, they tend to think of the tool schema the way they think of an OpenAPI spec: one source of truth, published once, consumed by clients. That intuition is wrong in a specific way.
The runtime spec is what the server validates against. If you tighten it — rename a field, narrow a type, make an optional parameter required — you have broken it on the wire. That part matches the REST intuition.
The in-context spec is what the model has internalized about your tool. It is assembled fresh every call from (1) the tool-definition block injected into the system prompt, (2) any few-shot examples you include that demonstrate tool use, (3) the serialized transcript of the agent's own prior tool calls in the current session, and (4) whatever the pretrained weights already encoded about the general shape of tools in your domain. Only the first of those is fully under your control. The third and fourth are not.
The consequence: an agent that called your tool successfully last turn, using the old field name, will very likely try to call it the same way this turn — even if you atomically deployed the renamed schema between the two turns. The prompt history is a few-shot example the model wrote itself, and it has more weight than you would like.
Why "silent acceptance" is worse than a 400
A REST API that receives an unknown field usually rejects it. A modern function-calling pipeline frequently does not. The framework accepts the model's JSON, strips unknown keys (or passes them through), and hands a dict to your handler. The handler sees the absence of a parameter, substitutes a default — often an empty string, null, or 0 — and runs. The user gets a plausible-looking empty result.
Three properties make this failure mode especially corrosive:
- It is invisible. No stack trace, no alert, no non-2xx response. Traditional observability (error rates, latency, 5xxes) will show nothing.
- The model covers for it. LLMs are trained to produce fluent explanations. Given an empty result, the model will generate a reason it is empty that sounds correct. "I searched and found no matching records" is indistinguishable from success when there really are no matching records.
- It propagates. If the broken tool feeds a downstream tool — search results into a summarizer, a retrieved doc into a reasoner — every subsequent step runs on corrupted input. One renamed parameter on one tool can corrupt an entire pipeline.
The first time this bites a team, they usually discover it through a user complaint, not through monitoring. That is the signal that your tool-schema change management is not yet mature.
Additive-only is the default discipline
The rules protobuf and GraphQL operators landed on, adapted for agent tool schemas, are roughly:
- Never rename a parameter. Add a new one. Keep the old one.
- Never narrow a type. Widen it or add a new field with the stricter type.
- Never make an optional parameter required. Add the requirement on a new parameter instead, and default the old path.
- Never remove a parameter until telemetry proves nothing is using it. Then reserve the name so it cannot come back with different semantics.
- Never reuse a name. A field name is a permanent identifier. If
querywas a string last year, it cannot be an object this year, even after a "clean" migration.
These rules feel pedantic until you experience the alternative. In a traditional API, violating them breaks a known set of identifiable clients who will eventually notice and file tickets. In an agent system, violating them breaks an unknowable set of in-flight sessions whose broken behavior will be rationalized into plausibility by the model itself.
The additive discipline maps cleanly onto protobuf's [deprecated = true] plus reserved field numbers. It maps onto GraphQL's @deprecated(reason: "...") directive. What it does not map onto yet — and this is the production foot-gun — is MCP, whose spec leaves tool evolution almost entirely up to the server implementer.
The MCP gap
MCP has protocol-level versioning (the handshake declares a protocol version). What it lacks, as of the current spec, is a standardized tool-level versioning layer. Tools are identified by name. Schemas are the JSON schema object the server publishes during tools/list. There is no canonical way to say "this parameter is deprecated," no equivalent of @deprecated, no convention for retaining an old name while routing to new behavior, no registry-level contract check that flags a breaking change in CI.
This is unremarkable when one team owns the server and one team owns the agent. It becomes dangerous the moment your MCP server has multiple consumers. Each consumer's agent has its own prompt history, its own few-shot examples, its own tolerance for weird results. A change that is "harmless" for the fastest-moving consumer — the one whose prompt got refreshed yesterday — is catastrophic for the consumer whose saved conversation threads include last month's field names.
Even changes that look cosmetic become behavioral. Tweaking a tool description from "searches the knowledge base" to "searches the corpus" can shift which cases the model routes to that tool. Reordering parameters in the schema object can change the model's prior over default values. The surface area of "breaking change" in an agent tool schema is wider than the surface area of "breaking change" in a REST endpoint.
A deprecation playbook for agent tools
The pattern that works borrows from the shadow-table migration playbook, adapted for schemas that are also prompts.
Stage 1: Additive shadow. Introduce the new parameter alongside the old one. Accept both. If both are present, the new one wins; if only the old is present, route it to the new behavior internally. Emit a structured log line tagged with deprecated_field_hit: <name> every time the old path fires. The runtime schema now carries both fields; the description for the old field reads "Deprecated: use <new_name> instead. Will be removed on or after <date>." The description for the new field is unambiguous and, critically, does not reference the old name — the model should not be primed to emit the deprecated field by reading the new one.
Stage 2: Behavioral dual-running. Leave the shadow in place long enough that the model's in-context history, few-shot examples, and saved transcripts have rolled over. "Long enough" is not a fixed interval — it is until your telemetry shows the deprecated-field-hit counter trending toward zero. The GraphQL community figured this out first: you do not know it is safe to remove a deprecated field; your observability does. Track hits per client, per agent, per day. If one agent stops calling the field but another has a weekly batch job that only runs on Sundays, you will see it.
Stage 3: Communicate the cutoff. Before removal, change the tool description to promote the removal date explicitly — in the description text the model actually reads. Something like: "The query parameter is deprecated and will be rejected after <date>. Use search_query instead." This pushes the information into the model's context window where the in-context spec lives, not just into a changelog that no one's agent is reading.
Stage 4: Harden, then remove. Before removing the field, temporarily flip the server's tolerance. Instead of silently accepting the old name and substituting the default, reject the call with a structured error that explains the migration. The model's self-healing loop will retry with the new field name — this is a feature, not a bug, because the error path now feeds back into the context and updates the in-context spec the same way a successful call would. Once the error counter is also zero, you can remove the field.
Stage 5: Reserve the name. After removal, add the old name to a reserved list on your server. If an agent ever re-emerges with an ancient prompt containing the old name, you want a loud, specific error — not the silent default that started this whole mess.
What changes in agents that does not change in REST
Two properties of the agent runtime make these patterns mandatory rather than optional:
The model is a non-deterministic client. A REST client either works or is broken; you can ship a fix once and be done. A model, given a prompt history that contains a deprecated field, will sometimes follow the new schema and sometimes fall back to what it saw work before. You cannot test your way out of this with a single passing integration check; you need telemetry on both paths for weeks.
The context window is a cache you do not own. Every long-running session carries a frozen view of your schema from whenever the session started. Until those sessions expire — and in assistants with persistent memory, they may never expire — the old schema is live in production regardless of what your server publishes today. Reserve your names accordingly.
The review question that changes things
When a PR lands that edits a tool schema, the useful review question is not "is this backwards-compatible on the wire?" It is "would an agent that remembers the old shape of this tool still get correct results on this turn?" That framing forces the reviewer to think about both the runtime spec and the in-context spec simultaneously. Most "backwards-compatible" schema changes fail that test.
The teams that are doing this well have landed on something like a contract snapshot — a versioned file checked into the repo that captures the full tool surface at a point in time, including names, types, descriptions, and required/optional flags. The snapshot is regenerated in CI from the live server; any diff against the committed version blocks merge unless a migration plan is attached. This is the same idea as protobuf's breaking-change detector or GraphQL's schema check, pulled into the MCP / function-calling world where it is urgently overdue.
The thing to internalize: your tool schema is not just a contract with software clients. It is a contract with every prompt history, every saved transcript, and every few-shot example you have ever shown the model. The wire protocol can be updated atomically. The model's mental model cannot. Plan deprecations accordingly, or plan to explain silent failures to users for weeks.
- https://dev.to/nesquikm/my-mcp-tools-broke-silently-schema-drift-is-the-new-dependency-hell-5c49
- https://medium.com/@binarEx/your-mcp-servers-tool-descriptions-changed-last-night-nobody-noticed-e3ad93cf6bc7
- https://nordicapis.com/the-weak-point-in-mcp-nobodys-talking-about-api-versioning/
- https://medium.com/@kumaran.isk/evolvable-mcp-a-guide-to-mcp-tool-versioning-ae9a612f7710
- https://medium.com/google-cloud/the-silent-breakage-a-versioning-strategy-for-production-ready-mcp-tools-fbb998e3f71f
- https://scottefein.github.io/mcp-versioning/
- https://modelcontextprotocol.io/specification/versioning
- https://protobuf.dev/best-practices/dos-donts/
- https://www.apollographql.com/docs/graphos/schema-design/guides/deprecations
- https://hasura.io/docs/3.0/graphql-api/versioning/
- https://www.dataexpert.io/blog/backward-compatibility-schema-evolution-guide
- https://dev.to/docat0209/3-patterns-that-fix-llm-api-calling-stop-getting-hallucinated-parameters-4n3b
