Skip to main content

The Tool You Added For One Agent Is Now In Every Agent's Hand

· 10 min read
Tian Pan
Software Engineer

Six months ago, somebody on the customer-support team wired a send_email tool for their agent. It worked. The platform team noticed it in the shared tool registry, gave a thumbs-up emoji on the PR, and moved on. This week, a security engineer ran an audit and discovered that send_email is in the action surface of the meeting-notes summarizer, the data-quality bot, an analytics assistant nobody officially owns, and a half-built prototype that hasn't been touched since January. None of these agents need to send email. None of them have ever been reviewed for whether they should be allowed to. The PRD for the meeting-notes summarizer is two sentences long and the words "outbound communication" do not appear in it.

This is the default state of every shared tool registry I have ever audited. The act of registering a tool — pushing a JSON schema and a handler into a central catalog — is treated as a developer convenience, like adding a utility function to a shared library. But once the registry is sourced into every agent's prompt, registering a tool is not a library change. It is a deployment to every agent in the company simultaneously, with no review of whether each of them should have received it.

The conceptual mistake is treating the registry as a catalog — a passive thing agents browse — when it actually functions as a capability grant. The moment a tool is in the registry and the registry is loaded into the agent's context, the agent has been granted the capability. Whether the agent uses it on this turn or not is incidental. The grant has been made.

The Default Of A Global Catalog Is "Yes To Everyone"

Most teams arrive at a shared registry by accident. The first agent is built; it has three tools. The second team builds an agent; they copy the first agent's harness because the wiring is fiddly and the first one works. Now there's a tools.json file shared between them, because keeping two copies in sync is annoying. The third agent comes along, and by then there's a tools-registry package on the internal npm or PyPI mirror, and importing it gives you the whole pile.

Nobody decided that every agent should have every tool. The decision was never made. The shape of the codebase made it the path of least resistance, and the absence of a per-agent allow-list meant that the registry became the allow-list by default. This is the same dynamic by which a database role with SELECT * access gets handed out because someone needed one query to work and nobody felt like making a tighter role.

The problem is that the failure mode of a shared registry is additive without subtraction. Tools get added when there is a clear local need. Tools get removed almost never — because removal requires proving nothing depends on them, and nobody owns the registry as a whole. So the catalog grows monotonically, and every new agent inherits the accumulated debt of every previous agent's needs.

Why "Permissions" Means "Per-Agent", Not "Per-Server"

The MCP ecosystem has done a fine job of teaching teams to think about authentication — does the agent prove who it is before calling a tool? — and a much worse job of teaching them to think about authorization at the right granularity. The default unit of authorization in most MCP setups is the server: a tool server publishes a set of tools, and an agent either has a token that grants access to that server or it doesn't.

This is the wrong granularity in two directions. First, it is too coarse: an agent gets the whole server's tool surface or none of it, even if it only needs one tool. Second, it is too narrow: the same agent typically connects to several servers, and the permissions question — should this agent have this capability — cuts across all of them. A meeting-notes summarizer that talks to your calendar server and your email server doesn't need write access on either, but a server-scoped token is a binary that doesn't express "read calendar events, read email subjects, send nothing."

The unit of authorization has to be the pair (agent, tool), not the server. That is the level at which the question "should this capability exist" can actually be answered, because that is the level at which someone can say "the meeting-notes summarizer's job is to summarize meetings, so it does not need to send email, full stop." Server-level grants flatten this distinction and force it back to the binary that gave you the problem in the first place.

The Audit Where You Find Out

The audit that surfaces this problem usually goes like this. Someone — a security engineer, a new platform lead, a curious intern with too much free time — writes a script that joins the agent registry against the tool registry and produces a matrix: which agents have which tools in their action surface. The matrix is dense in a way nobody expected.

The meeting-notes summarizer has send_email, create_calendar_event, delete_document, update_user_role, and execute_sql. The data-quality bot has send_slack_message, wire_transfer, restart_production_service, and escalate_to_oncall. Every agent has every tool the platform team has ever shipped, because nobody set up a filter on the way in. The discovery is not that agents are being asked to do harmful things — for the most part, the LLM doesn't choose to wire-transfer money during a meeting summary. The discovery is that nothing prevents it from doing so, and the only safety net is the model's good judgment, which is not a safety net.

The recent literature on tool-calling least privilege puts numbers on what this audit feels like. The MiniScope paper from late 2025 found that most agents in their study had access to between three and ten times the tools they ever actually invoked, and the gap was wider for older agents — because tools accumulate, and dead capabilities never get removed. Vercel's engineering team reported that pruning 80% of an agent's tools improved task completion, which is another way of saying that the 80% that came along for the ride was actively harmful, even before the security question.

The Cost Of A Tool You Never Call

The audit also surfaces a cost most teams don't price in: every tool definition in the action surface costs context tokens, every turn, whether the tool is used or not. Standard MCP setups have been measured consuming up to 72% of an agent's context window on tool definitions before the user has typed anything. The Berkeley Function Calling Leaderboard found tool-selection accuracy dropping from 43% to under 14% as the number of available tools went from four to fifty-one. The agent gets slower, dumber, and more expensive — three taxes you pay on every request — because somebody added a tool six months ago for a workflow that doesn't exist anymore.

So even setting aside the security argument, the engineering argument for per-agent allow-lists is strong. Smaller, focused tool surfaces produce better agents. The shared registry is, among its other sins, a quality regression.

What A Per-Agent Allow-List Actually Looks Like

The fix is conceptually simple and operationally annoying, which is why most teams haven't done it. Each agent gets a manifest — a small, version-controlled file — that explicitly lists the tools it is allowed to see. The registry no longer defines what's available to the agent; the manifest does. The registry just defines what's available to the manifest.

A manifest looks like:

agent: meeting-notes-summarizer
tools:
- calendar.read_events
- documents.read
- documents.write_to_owner_folder

That's three lines that have to be reviewed by someone who can answer "does this agent need this capability for its job?" The review is the same kind of review you'd do for adding a database role or an IAM policy. The artifact is small. The diff is readable. The blast radius of "I added a tool to the registry" shrinks from "every agent" to "the manifest I just updated."

Adding the manifest layer also creates a place to hang other policy. You can declare rate limits per agent per tool. You can declare data scopes ("this agent can read from this folder but not others"). You can declare which user identities the agent is allowed to act on behalf of. None of this is possible when the registry is the de facto allow-list, because the registry doesn't know which agent is asking — it just hands out the same catalog to everyone.

Tool-Set Composition As The Primitive

The deeper shift is that the unit of design stops being "the tool" and starts being "the agent's tool set." A tool in isolation is fine; what matters is the composition. The meeting-notes summarizer should have a small, internally coherent set — read calendar, read documents, write summaries to a constrained location. That set should be designed and reviewed as a unit, the way you'd design an API surface or a microservice's permission boundary.

Tool-set composition makes the registry's role explicit: the registry is a library of capabilities, not a default action surface. Composing a tool set is a deliberate act, just like composing a SQL role or composing an IAM policy. The fact that we have spent twenty years getting these other compositions right and then handed agents the equivalent of "GRANT ALL" should make us suspicious, not comfortable.

This also reframes what "adding a tool" means. Adding a tool to the registry adds it to the library — fine, low-risk, no change in agent behavior. Adding a tool to a specific agent's manifest is the real deployment. It's the change that grants capability. It should require the same review that adding a tool to a single agent's hard-coded list would have required before the registry existed. The shared registry didn't eliminate that review; it just moved it somewhere harder to see.

The Principle Worth Carrying

Every shared resource trends toward "everyone can use it" unless someone designs an explicit boundary. A shared database with no row-level security ends up with every service reading every row. A shared S3 bucket with no per-prefix policy ends up with everyone reading everyone's data. A shared tool registry with no per-agent manifest ends up with every agent having every capability. None of these states were ever decided. They were inherited from the absence of a decision.

The principle worth carrying into agent platforms: capability grants need an explicit grant, not a default-yes. The registry should be additive only to the library; the manifest is what grants the agent the capability; the manifest is what gets reviewed. Treat adding a tool to a specific agent's manifest with the same scrutiny you'd treat granting that agent a database role. If you wouldn't be comfortable granting send_email to every service in your backend by adding a single line to a shared config, don't be comfortable doing the same thing to every agent in your company.

The audit always finds the same thing. The tools that came along for the ride. The agents that quietly acquired powers nobody asked them to have. The PRDs that don't mention the capabilities the agents wield. The principle of least privilege has been a foundational idea in security for forty years, and we are watching a generation of agent platforms re-discover it the hard way, one over-privileged meeting-notes summarizer at a time.

References:Let's stay in touch and Follow me for more thoughts and updates