Skip to main content

Retiring an Agent Tool the Planner Learned to Depend On

· 10 min read
Tian Pan
Software Engineer

You unregister lookup_account_v1 from the tool catalog, swap in lookup_account_v2, and edit one paragraph of the system prompt to point at the new name. Tests pass. Three days later, support tickets start mentioning that the assistant "keeps trying to call something that doesn't exist," or — more disturbingly — that it answers customer questions with confident, plausible numbers and never hits the database at all. The deprecation didn't fail at the wire. It failed in the planner.

This is the gap between treating a tool deprecation as a syntactic change and treating it as a behavioral migration. The agent didn't just have your function in a registry; it had months of plans, multi-step recipes, and few-shot examples that routed through that function as a checkpoint. Pulling it out is closer to retiring an internal API your downstream services have informally hardcoded — except the downstream service is a model whose habits you cannot grep, and whose fallback when its preferred tool disappears is to invent one.

Tool retirement is the part of agent engineering where the API-versioning playbook breaks down most expensively. A REST consumer that calls a removed endpoint gets a 404 and a stack trace. An agent that "calls" a removed function might emit a phantom tool call against the old name, get rejected by the runtime, retry with a hallucinated argument shape, and then — having burned through its retry budget — produce a fluent answer that satisfies the user-visible turn while quietly skipping the lookup it was supposed to perform. The wire-level failure is the easy one to catch. The behavioral failure is the one that ships.

What the agent actually internalized about your tool

When engineers think "the agent uses the search_orders tool," they tend to picture a clean dependency: prompt → planner → tool spec → tool call. The reality is messier. The model has at least four overlapping representations of the tool, each of which has to be updated when you retire it.

The first is the schema in the runtime registry. That is the only representation you actually own and can atomically change. The second is the in-context tool list the model sees this turn — typically rendered from the registry, but cached in your prompt template, sometimes pinned to a specific version for prompt-cache stability, occasionally still containing a copy of the old description nobody noticed. The third is the few-shot examples in your system prompt that demonstrate the canonical shape of a successful call: "to find an order by SKU, call search_orders with {sku, region}." Those examples were tuned six months ago against the old tool, and they are now lying to the planner about what's available.

The fourth, and most uncomfortable, is what the model absorbed during pretraining or fine-tuning about API patterns that look like yours. If your tool was named generically — search, get_user, lookup — there is a measurable chance the model will produce calls in that shape regardless of what your registry says, because that is a high-probability completion in many similar contexts. Retraining the registry doesn't dent that prior. The remediation is structural: distinctive names, distinctive argument shapes, and aggressive eval coverage on prompts known to trigger the prior.

The first two you can update with a deploy. The third requires you to find every system prompt and few-shot example in your codebase that referenced the old tool — and the third is where most teams forget to look. The fourth requires you to assume hostile prior beliefs and design around them.

The staged retirement pattern

The discipline that ships clean retirements borrows from how protobuf operators handled field deprecations a decade ago: never break, always overlap, observe behavior, then remove.

Mark deprecated, do not remove. The first move is to update the tool's description in the registry to declare the deprecation in plain language the model can read. "DEPRECATED — use lookup_account_v2. This tool will be removed on YYYY-MM-DD." The model reads tool descriptions every turn. That sentence is your cheapest behavioral nudge, and it costs you only a handful of tokens. Crucially, the tool still functions during this window — calling it returns the same data. You are not breaking anything; you are advertising the change.

Dual-register the new tool with a different name. Resist the temptation to release lookup_account_v2 as a drop-in replacement under the same name. Distinct names give the planner a freshness signal it can act on, and they let you build evals that diff the two. The new tool's description should explicitly reference the old: "Replaces lookup_account_v1. Identical contract except region is now required." That cross-reference is a hint catalog the planner can consume without you having to rewrite every system prompt.

Soft-removal window with telemetry. During the migration, every call to the deprecated tool succeeds — and emits a deprecation event your platform tracks. You want a dashboard that answers two questions: which sessions still route through the old tool, and which prompts produce that routing? The first tells you when it's safe to remove. The second tells you which few-shot examples and which downstream parsers in your codebase still need updating. The deprecation event is metadata, not a failure; it is the signal you use to schedule the actual removal.

Hard removal with a structured failure mode. When the soft window ends, do not silently 404 calls to the old tool. Return a structured error the model can recover from: "Tool lookup_account_v1 was retired on YYYY-MM-DD. Use lookup_account_v2 instead. Identical contract." The model can read that error and re-plan; a generic exception cannot. Teams that skip this step discover that their agents respond to the removal by hallucinating arguments to the dead function, falling into a retry loop, and finally producing an answer with no tool call at all.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates