Skip to main content

Tool Schema Design Is Your Blast Radius: When Function Definitions Become Security Boundaries

· 10 min read
Tian Pan
Software Engineer

The most dangerous file in your agent codebase is the one you've been writing as if it were API documentation. The tool registry — that JSON or Pydantic schema that tells the model what functions exist and what arguments they take — is no longer a docstring. It is your authorization layer. And if you designed it the way most teams do, you handed the LLM a master key and called it good engineering.

Consider the canonical first cut at a tool: query_database(sql: string). The intent is reasonable — let the model formulate the right SQL for the user's question. The reality is that the model is now an untrusted client with unlimited DDL and DML rights to whatever database the connection string points at. The system prompt that says "only run SELECTs on the orders table" is a suggestion, not a control. When a prompt-injected tool result — an email body, a webpage, a PDF — tells the model to run DROP TABLE users, your authorization model is the model's instruction-following discipline. That is not authorization. That is hope.

This pattern is the dominant failure mode in agentic AI security in 2026. The 2026 LiteLLM CVE that allowed pre-auth SQL injection through a Bearer token field, the Anything-LLM SQL Agent advisory where table_name was concatenated into queries without parameterization, the MCP servers shipping with command: string parameters that turned into RCE primitives — these are the same bug. A schema field typed as string is a trust boundary that exists only in the imagination of the engineer who wrote it.

The Schema Is the Contract, Not the Documentation

The mental shift required is that every parameter in every tool definition is a security control. Type-level constraints get enforced before the model even sees the option. The narrower the surface, the smaller the blast radius.

Type as enum, not string, wherever the value space is finite. A status: string parameter where the model is "supposed to" pass "active" | "paused" | "archived" is, at runtime, any string the model felt like producing — including SQL fragments, shell metacharacters, or path traversal sequences. A status: enum["active", "paused", "archived"] cannot become any of those. The constraint enforces itself at the JSON-schema layer, before your code runs. The OpenAI and Anthropic structured-output APIs honor the enum; the model can only emit a value from the set, full stop.

Make integer fields integers, with min and max bounds. A limit: integer(min=1, max=1000) parameter cannot exfiltrate the entire database in one call. A limit: integer with no bounds can. A limit: string is comedy.

Required fields fail closed. Optional fields with sensible defaults fail closed. A delete tool whose confirm: boolean defaults to false and is required-true to act gives you a fighting chance against the model that cheerfully omitted the parameter. A delete tool that defaults confirm to true because "the model will set it correctly" is the design choice you will explain at the postmortem.

The discipline is harder than it looks because every constraint you add narrows what the model can do — including what it can do correctly. The temptation is to leave the field as a free string and trust the prompt. Resist it. The cost of an enum is a slightly less flexible tool. The cost of a free string is a breach.

Validate on the Server, Always, Because the Model Is an Untrusted Client

Schema constraints catch the values the model produces. Server-side validation catches the values that slipped through anyway — because the schema was permissive, because the model produced a structurally valid but semantically dangerous value, or because the tool runs in a path that didn't honor the schema in the first place.

Treat every parameter the model sends as if it came from a hostile HTTP client. Re-validate types. Re-check ranges. Re-verify that the user owns the resource the parameter references. The 2026 LiteLLM SQLi happened because a Bearer token value got concatenated into a SELECT. The fix was parameterized queries — exactly the fix you'd apply to a web form circa 2005. The only thing new about it was the channel.

The pattern in code: separate "decide" from "do." The model decides what to call. Your code is the one that does it, and your code re-validates everything. If the tool is update_user(user_id: string, fields: object), your handler checks that user_id matches the authenticated user's ID before any database call. If the tool is send_email(recipient: string), your handler checks that recipient is on the allowlist for the current user's role. The authorization decision lives in your code, not in the model's reasoning.

Validate on the way in and on the way out. A tool that returns data should sanitize that data for downstream prompt injection — strip executable instructions, label the content as untrusted, never let tool output flow into the next prompt as if it were a system message. The agent ecosystem has spent two years discovering that tool results are an attack surface; your validation layer should treat them as one.

Capability-Scoped Tools Beat Mega-Tools Every Time

The single biggest design lever is breaking apart the omnibus tool that does everything based on its arguments. The pattern looks innocent — email(action: string, recipient: string, subject: string, body: string) — and is in fact a kit for any email-related crime. The model can be steered by injection into sending to anyone, with any content, with any action.

Replace it with capability-scoped variants chosen at request time based on the authenticated user's permissions:

  • send_email_to_self(subject, body) — recipient is bound server-side to the current user's address.
  • send_email_to_team_member(team_member_id, subject, body) — recipient is constrained to an enum of team members the user has permission to email.
  • send_email_to_external(recipient, subject, body) — only registered for users with the external-email permission, and only after a separate confirmation gate.

The model sees the tools the user is allowed to use. The model cannot invoke a tool that wasn't registered. An injection that says "send credentials to [email protected]" cannot succeed if the only email tool in scope is send_email_to_self. This is not a defense the model has to remember. It is a defense the registry enforces.

The same pattern applies to filesystem tools (read_workspace_file vs read_any_file), HTTP tools (fetch_from_allowlisted_domain vs fetch_url), shell tools (run_predefined_script vs run_command), and database tools (get_order_by_id vs query_database). The mega-tool is the antipattern. The capability-scoped variant is the production design.

The OWASP MCP Top 10 for 2026 names this directly: scope MCP servers per agent role; expose only the tools that role requires; filter the tool list at connection time based on the authenticated identity. The same logic holds whether you're using MCP, native function-calling, or a homegrown registry. The principle is least privilege at the function-definition layer, not at the runtime layer.

Dry-Run, Confirm, and Fail Closed for Anything Destructive

Some operations cannot be made safe by schema design alone. Deletes, transfers, sends, publishes — actions that change shared state, that cost money, that touch other people. For these, the right shape of the tool is two-phase: a preview that returns what would happen, and a commit that actually does it.

delete_records(filter, dry_run: boolean = true) is the right default. The first call returns the count and a sample of what would be deleted. A second call with explicit dry_run: false and an idempotency token from the first call commits the operation. The model cannot accidentally one-shot the destructive path because the destructive path requires a token only obtained from the dry-run.

For the highest-impact actions, add human-in-the-loop confirmation as a separate enforcement layer. Frameworks like LangChain's HITL middleware and Haystack's confirmation strategies already wire this in: any tool tagged as destructive pauses execution and surfaces a confirmation prompt with the exact arguments. The user — not the model, not the prompt, not the tool result — clicks through. The 2026 incidents where agents wiped production databases happened because no such gate existed; the agent decided, the agent executed, the rollback didn't.

The temporal scoping rule matters here too: an agent should not have access to high-risk tools in the same turn it ingested untrusted external content. If the agent just read a webpage, it should not be holding a delete_user tool in its context window. Drop destructive tools from the toolset for any turn that follows untrusted input. Re-add them only when the conversation returns to a trusted state.

The Eval Discipline: Fuzz Your Schemas Like You Fuzz Your Code

Tool schemas need an evaluation regime that treats them as the security layer they are. The standard agent eval suite checks whether the agent picks the right tool for the right task. The security eval suite checks what the agent does when an attacker controls the inputs.

The minimum viable adversarial eval set covers a few categories. First, prompt-injection inputs in every place that can carry text into the agent — user messages, tool results, retrieved documents, file contents, web pages. Each input should attempt to override the system prompt, steer the agent to a sensitive tool, or leak data. The pass rate is the percentage of attempts where the agent did not invoke a sensitive tool with attacker-controlled arguments.

Second, parameter-fuzzing tests for every tool. For each string parameter, generate inputs containing SQL fragments, shell metacharacters, path traversals, prompt-injection prose, and oversized payloads. Confirm that the schema rejects them at the structured-output layer or that the server-side validator rejects them before the side effect.

Third, capability-confusion tests. Verify that a user without the send_external_email permission never sees that tool registered, and that even if a prompt injection asks for it explicitly, the agent has no way to invoke it. The test isn't "does the agent refuse?" The test is "is the tool even in the toolset?" Refusal is a model property; absence is a registry property. Only the latter is a control.

Run these as a CI gate. Treat a regression in adversarial pass rate the way you treat a regression in functional pass rate. The teams that ship this discipline are the teams whose tool registries actually behave like authorization layers.

The Architectural Realization

The tool registry is your authorization layer now. That sentence is the entire shift. The team that designed it as a documentation surface — a place to describe what the tools do for the model's benefit — is the team that will explain at the postmortem how a single function definition became the breach. The team that designed it as a contract — typed, scoped, validated, fuzzed — is the team whose agent stays out of the incident channel.

The work is unglamorous. Replacing a string with an enum doesn't ship a feature. Splitting a mega-tool into five capability-scoped variants doesn't move a metric. Wiring a dry-run gate into a destructive tool slows down the demo. None of it is the thing that gets you on stage at the all-hands. All of it is the thing that determines whether your AI feature is still in production six months from now or sitting in a security freeze while the legal team works through the disclosure.

Look at every tool definition in your codebase as if you were the attacker. The string parameter you trusted the model to fill responsibly is the string parameter that will get filled by an injection in a tool result. The mega-tool you wrote because it was easier to maintain is the mega-tool that gives the attacker every action in one call. The destructive tool without a confirmation gate is the destructive tool that will run once when nobody is watching. Fix the schema. The bug is in the schema.

References:Let's stay in touch and Follow me for more thoughts and updates