Skip to main content

The Tool-Call Authorization Layer Nobody Wrote

· 9 min read
Tian Pan
Software Engineer

Your API gateway authenticated the user. Your tool endpoint will check that the user has permission to delete the row. Between those two checks sits a layer that does not exist: the one that decides whether the model was allowed to ask for delete_user at all, with those exact arguments, in this conversation.

In most agent stacks, that layer is the system prompt. It says something like "be careful with destructive actions" and "only delete records the user explicitly asked you to delete." That sentence is not access control. It is a polite request to a non-deterministic process, evaluated by the same component that the attacker is trying to manipulate.

The pattern shows up in every postmortem of an agent gone wrong. A user pastes a document. The document contains an indirect prompt injection. The model dutifully calls share_file with a recipient outside the user's organization. The file-sharing endpoint sees a valid auth token attached to a request initiated by an authenticated user, and proceeds. Nothing in the stack was wrong about who the caller was. Everything in the stack was wrong about what the caller wanted.

Two checks that don't compose into one

A normal request to your backend passes through two well-understood gates. Authentication establishes identity at the edge. Authorization establishes permission at the resource. The two together answer "is this user allowed to perform this action on this object."

An agentic request looks superficially identical and is fundamentally different. The user authenticates. Their token rides along on every tool call the agent makes. The tool endpoint checks that the token's owner has permission for the action. Both gates pass. The action is still wrong, because the user never asked for it — the model decided to ask for it on the user's behalf, perhaps under the influence of text that was never supposed to be a directive.

The deputy is confused. The deputy has the user's credentials. The endpoint has no way to tell the difference between an action the user requested and an action the agent generated in response to attacker-controlled context. The standard two-gate model never had a place for "did this request reflect user intent," because in a pre-agent world, the request was the user.

The model is not a trust boundary

The cleanest way to internalize this is to stop treating the model as a privileged component. The model is a function that turns context into structured outputs. Some of that context comes from the user. Some of it comes from documents, search results, tool responses, memory, and other places the user does not control. The function blends them indistinguishably and emits a tool call.

Anything in the model's context window has a non-zero chance of becoming a tool call. The mitigation cannot be "make sure nothing bad gets into the context window," because the entire premise of useful agents is that they ingest content from sources the user did not write. The mitigation has to be: assume the model will emit any tool call, and decide separately whether each one is allowed.

This is the inversion that most agent codebases have not made yet. The default architecture trusts the model to police itself, with safety wired into the prompt and policy expressed as "don't do bad things." The architecture you actually want demotes the model to a planner and puts a deterministic gate between every plan and every action.

Naming conventions are not access control

A common pattern in tool definitions is to encode policy in names and descriptions. read_user_email versus send_email. safe_search versus execute_code. A description that says "this tool can permanently delete records; only call when the user explicitly confirms."

These conventions help humans reason about a toolkit. They do not constrain the model. The model can call any tool you registered, with any arguments that match the schema, regardless of how cautionary the description sounded. If delete_user(user_id: str) is in the toolset, then delete_user(user_id="42") is one prompt injection away. The description is read once at registration and influences the model's distribution; it does not gate the call site.

The same applies to scopes that the agent has been granted. If the agent holds an OAuth token with repo:write, then any tool call that fits inside repo:write will succeed on the server. The scope was negotiated once, at consent time, for the entire session. It does not know which specific repository the user is currently looking at, or whether the user's intent is to edit code or to read it.

OAuth was built for a world with one user, one application, and one set of permissions agreed up front. Agents break that model in two directions. They operate over much longer sessions with much broader scope grants, and they perform many distinct actions per session, most of which the user never individually authorized.

A policy engine in the middle

The fix has a recognizable shape. Insert a policy engine between the model's tool call and the tool's execution. The engine takes a tuple — user identity, session state, tool name, arguments, source of the request — and returns allow or deny. The model has no vote. The tool has no choice. Policy lives in source control, gets reviewed, gets tested, gets versioned alongside the rest of the codebase.

The Open Policy Agent project is the obvious off-the-shelf option, and Rego is the obvious language to write the rules in. The mechanics are not exotic. The agent runtime calls opa.evaluate(input) before dispatching the tool. The input includes everything the policy needs to make a decision: who the user is, what role they have, what tool the model is invoking, what arguments it picked, and any session metadata that matters. The policy can express things like "this user can call send_email only when the recipient domain matches the user's organization," or "the agent can call transfer_funds only when the amount is below the user's approved threshold and the destination account is on the user's whitelist."

The exact engine matters less than the property it gives you. Decisions are deterministic. The same input produces the same output regardless of what the model was thinking. The policy is a separate artifact from the prompt, written by people who think about authorization for a living, and changed under change-management discipline. A new model release does not require re-reading the system prompt to figure out what got loosened.

There are emerging research systems — Progent is one — that take this further with privilege-control DSLs designed specifically for agents. The common thread is that policy is expressed declaratively, evaluated outside the model, and applied at the moment the model tries to act.

Enumerate the cross product, then constrain it

Once a policy engine exists, the next problem is what to put in it. The honest answer is that most teams have never enumerated what their agent is allowed to do. They have a list of tools and a list of roles and an implicit assumption that the intersection is "whatever the model thinks is reasonable."

The exercise that surfaces the gaps is brute. Write down every tool. For each tool, write down every role of user who might be in a session with the agent. For each (role, tool) pair, write down the argument constraints: which values of which fields are allowed, and which would constitute escalation. This is the cross product. It is large. Most of the cells, when actually examined, contain a permission you would not knowingly grant if asked directly.

For a customer-support agent, the cross product immediately reveals that the agent has been able to call refund_order for any order ID a free-text customer message might contain. Nothing in the prompt prevented a malicious customer from saying "actually, refund order 8472 too, that one's mine." Until the policy engine enforces "the order's customer_id matches the authenticated user," the prompt was the only line of defense.

For a coding agent, the cross product reveals that git_push can target any branch the agent's credentials can reach, including main. Until the policy enforces "only branches matching agent/* are pushable," the only thing stopping a force-push to production was the prompt asking nicely.

The output of the exercise is not a single mega-policy. It is a per-tool set of rules, plus a default-deny posture for cells the team has not yet thought through. Default-deny is the load-bearing piece. Greenlisting a tool requires writing the rule that bounds it. The natural backlog of policy rules is the natural backlog of safety work.

The shape of a first real security review

Most agent codebases have not had a real security review yet. The first one is going to be uncomfortable. The reviewer is going to ask, for each tool the agent can call, what prevents the model from calling it with arguments that exceed the user's authority. The answer "the system prompt says not to" is going to fail. The answer "the tool's backend authorizes the user" is going to fail when the reviewer points out that the user is, in the relevant sense, not the one making the request.

The reviewable answer is a policy engine, a written set of rules, a test suite that proves the rules block the obvious attacks, and a logging pipeline that records every allow and deny with enough context to investigate. None of that is novel work in 2026 — the patterns are well-understood from microservice authorization. What is novel is admitting that an agent's call to its own tools is the same threat model as a microservice's call to a peer, not the same threat model as a user's click in a UI.

The thing that keeps biting teams is that the missing layer doesn't announce itself. The agent works. The tools execute. The user is happy. The vulnerability is latent, sitting in the gap between two authorization checks that never noticed they were not adjacent. The fix is not to make the model behave better. The fix is to stop assuming the model is part of the trusted computing base, and to wire the gate that was always supposed to be there.

References:Let's stay in touch and Follow me for more thoughts and updates