Skip to main content

Tool Schemas Are Prompts, Not API Contracts

· 11 min read
Tian Pan
Software Engineer

The most expensive line in your agent codebase is the one that auto-generates tool schemas from your existing OpenAPI spec. It looks like a clean engineering choice — single source of truth, no duplication, auto-sync on every API change. It is also why your agent picks searchUsersV2 when it should have picked searchUsersV3, fills limit=20 because your spec's example said so, and silently drops the tenant_id because it was buried in the seventh parameter slot.

Nothing about this shows up in unit tests. The schema validates. The endpoint exists. The agent's call is well-formed JSON. And yet the model uses the tool wrong, every time, in ways your QA pipeline never sees because it tests the API, not the agent's reading of the API.

The bug is conceptual. OpenAPI was designed to describe APIs to humans who write SDK code; tool schemas are read by an LLM at every single call as a piece of the prompt. Treating them as the same artifact is the same category mistake as auto-generating user-facing copy from your database column names.

OpenAPI describes; tool schemas instruct

An OpenAPI spec is a contract. Its job is to let a code generator emit a typed client and let a developer reading the rendered Swagger UI understand what to send. The description prose is documentation — humans skim it, then write code that obeys the types. Parameter ordering is cosmetic; the SDK exposes named arguments and the keys go into a JSON body where order is irrelevant.

A tool schema is none of these things. It is a chunk of text that gets concatenated into the model's context every time the agent considers calling the tool. The model has no SDK to fall back on, no autocomplete to remind it which fields exist, no compile error to catch a missing required field before runtime. Everything it knows about your tool comes from the description, the parameter names, the parameter descriptions, the type annotations, and the defaults. That artifact is a prompt.

Once you internalize that framing, several "bad" model behaviors stop looking like model failures and start looking like prompt failures. The model didn't fill the wrong field; the field name was ambiguous and the description didn't disambiguate. The model didn't pick the wrong tool; the two tools had near-identical descriptions auto-translated from API summary fields written for human skim-readers. The model didn't drop the optional security parameter; it appeared near the end of a fifteen-parameter list with a description that read, in full, "Optional. See docs."

Your OpenAPI spec is fine. It just isn't a prompt.

The description is the contract

In an OpenAPI spec, the description field is documentation. In a tool schema, it is the first thing the model reads — and often the only thing it reads carefully — when deciding whether to call the tool, which tool to call among similar ones, and what to put in each argument.

A good tool description tells the model four things: what the tool does, when to use it (and when not to), what it returns, and what the caller must know that isn't already implied by parameter types. None of these are reliably present in an OpenAPI summary auto-translated into a description field. OpenAPI summaries are written for engineers who already know what the endpoint is for and just want a one-line reminder; LLM tool descriptions are written for an agent that may have eight similar tools available and needs to disambiguate.

Compare two descriptions for the same endpoint. The auto-generated one says "Search users." That is a fine OpenAPI summary. The hand-tuned tool description says "Find users by name, email, or employee ID. Returns up to 50 matches sorted by relevance. Use this when you need to look up a person from partial information; do not use this to enumerate all users in a tenant — use listTenantUsers for that. Returns an empty array if no match; never errors on miss."

The second description does the work that the OpenAPI spec assumes a human will do by reading the surrounding documentation, the related endpoints, and the page header. The model gets none of that surrounding context. If the surrounding context isn't in the description, it doesn't exist.

Parameter ordering is a priority signal

In a typed SDK, the order of parameters in a function signature is a usability concern: required first, optional last, related parameters grouped. Once compiled, the call site uses named arguments and the order is gone. In a JSON request body, order is gone before the body even leaves the client.

LLMs read tool schemas top to bottom. The parameters appearing earlier get more attention than the ones appearing later — a positional bias that has been documented in agentic failure analyses and that practitioners notice the moment they reorder a schema and watch the agent's behavior change. If your most important disambiguating parameter is the seventh one in the list, the model will fill the first six on autopilot and treat the seventh as something it can probably skip.

This is not a bug to be patched with better prompting. It is the predictable behavior of a system that consumes its instructions as a sequence of tokens. The fix is to design the parameter list as a priority-ordered prompt: the parameters the model must reason about go first, the parameters that have safe defaults go last, and the parameters that exist solely for backward compatibility don't go in the schema at all.

This conflicts directly with OpenAPI conventions. An OpenAPI spec lists parameters in a sensible documentation order — path params, then query params, then body fields, often grouped by resource subobject. Auto-generation preserves that order. The result is a schema where the parameters the model needs to think about are scattered across whatever taxonomy the API designer used in 2021, with no relation to which ones the agent should reason about first.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates