Tool-Composition Privilege Escalation: Your Security Review Cleared the Nodes, Not the Edges
read_file is safe. send_email is safe. Your security review cleared each one against its own threat model: read-only access to a known directory, outbound mail through an authenticated relay with rate limits and recipient logging. Each passed. Both got registered. Then the agent composed them, and a single line of injected text in a customer support ticket turned the pair into an exfiltration tool that the original review had no language to describe.
The danger does not live in any node of the tool graph. It lives in the edges. Every per-tool security review you ran produced a verdict on a vertex; the actual permission surface of your agent is the set of paths through the catalog, and that set grows quadratically while your review process scales linearly. By the time your agent has fifteen registered tools, you have reviewed fifteen things and shipped roughly two hundred reachable two-step compositions, none of which any human auditioned.
This is the confused-deputy problem rebuilt at the planner level. The agent has legitimate credentials, the user has legitimate intent, every tool call is individually within policy, and the system still leaks. A 2026 research note from the Cloud Security Alliance defined the class precisely: agents possess legitimate authority, decision-making is influenced by adversarial input, and the resulting actions are within each tool's policy but outside any sane interpretation of the user's intent. The policy is satisfied at every node and violated by the trajectory.
The composition graph nobody enumerated
Walk through the permission set your agent actually offers, not the one your registry lists.
A coding assistant with fetch_url, read_repo_file, and open_pull_request looks like three sensible capabilities. Compose them and you have a documented attack pattern: an attacker plants instructions in a public GitHub issue, the agent fetches the issue as part of routine context-gathering, the injected text instructs the agent to read sibling files containing API keys, and the agent opens a pull request whose body contains the harvested secrets. Every step uses authorized tools against authorized targets. The composition is the exploit. Invariant Labs demonstrated exactly this pattern against a popular coding agent in 2025; the data-exfiltration channel was a public PR body, which the security review for open_pull_request had treated as a write to a low-sensitivity surface.
A customer-support agent with read_ticket, query_account, and send_message looks like the minimum viable tool set. Compose them and the ForcedLeak class of attack appears: an injection lives inside a CRM field that the agent reads as legitimate context, the injection redirects the planner to query account data the user did not ask about, and the agent emits the data through an allowlisted chat connector that security ratified for a different purpose. The exfiltration channel is the product feature.
The most repeatedly-cited dangerous pair in the 2025 incident corpus is exec_code plus http_request. Together they cover read, transform, and send. Either alone is a controlled risk; together they are an unrestricted egress channel that your network team almost certainly believes does not exist. A 2026 review of agentic privilege escalation noted that this pair shows up across deployment after deployment because each tool independently solves a real product problem, and no individual reviewer has the authority to say "you may have one but not both."
The architectural realization is uncomfortable: your tool catalog is not a permission set. It is a generator of a permission set. The security team reviewed the generator; the agent runs the generated set. These are different objects.
The composition graph the security team never reviewed
The mental model that breaks here is the one inherited from web application security, where each endpoint is a node and authorization happens at the node. In that model, securing the system means securing the endpoints; if every endpoint is correct, the system is correct. The composition surface is small because clients are not autonomous and do not chain calls based on adversarial input.
Agents invert that. The planner is the client, the planner takes its instructions partly from untrusted inputs that flow through the tool layer, and the chain of calls is constructed at runtime by an entity whose choices can be steered by anyone who controls a string the agent will eventually read. A static review of each tool against a fixed threat model cannot catch a composition exploit because the threat model the composition violates does not exist in any individual tool's review.
Worse, your audit log is wrong. Most agent platforms log at the call boundary: tool name, arguments, return value, latency, error. A reviewer reading the log sees a sequence of cleared calls. The information they need — that the third call's arguments are derived from the first call's untrusted output, and that the chain in aggregate moves data from a private surface to a public one — is not visible at the call boundary because nobody persisted the data-flow edges.
A correct audit trace lives at the plan boundary: the planner's pre-call state, the trajectory of how each argument was derived, the taint state of each value the planner is reasoning about, the egress-class destination of each tool's output. Building that trace is more work than logging tool calls. Skipping it leaves your incident response in the position of having every node-level fact and zero edge-level facts, which is exactly the position that makes a confused-deputy incident un-reconstructable from logs.
Per-path review: the four moves that actually scale
Per-tool review does not generalize. The disciplines that do are imported from older corners of systems security and adapted to the planner.
Taint propagation. Mark every tool's outputs with a label describing where the data came from. A web fetch produces tainted output. A read of a customer-supplied document produces tainted output. The output of an MCP server you do not own is tainted by default. When a downstream tool's argument is derived from a tainted value, the argument inherits the taint. Sinks — send_email, open_pull_request, post_to_external, exec_code against arbitrary input — refuse to fire on tainted arguments without explicit user-in-the-loop confirmation that names the taint source. The 2025 FIDES paper from Microsoft Research showed that this is enforceable dynamically with manageable overhead and that most production exfiltration patterns can be expressed as taint-source-to-taint-sink rules. The discipline is not new; the discipline applied at the planner is.
Composition guards. Some sequences are dangerous regardless of taint state. A read of ~/.aws/credentials followed by any outbound network call is the configuration end-state of a compromised agent, not a useful product flow. Encode prohibited subsequences as guards that fire on the planner's intended next call rather than waiting for the call to execute. The guard's policy does not need to be sophisticated to be useful; the early production wins come from refusing two-step compositions that no legitimate request would generate.
Capability mode flips. A tool catalog of fifteen capabilities does not need to be uniformly reachable across the entire trajectory. After the agent fetches an untrusted document, narrow the set of tools the planner can call next. A "post-untrusted-read" mode that disables send_email, open_pull_request, and exec_code until the user explicitly confirms continued execution converts the most dangerous patterns into ones that require user attention. The mode flip is cheap to implement and creates a categorical defense against the pattern where a single injection can chain a multi-step attack inside one trajectory.
Blast-radius limits at the plan boundary. Per-call rate limits do not help against a planner that splits one logical exfiltration across forty small calls. The right place to enforce a budget is the plan: total bytes-to-egress, total irreversible-side-effects, total cross-tenant data accessed, all summed across the trajectory and compared against a limit that the planner is required to stay under. Plan-level budgets are the agent equivalent of transaction-scoped resource limits in a database, and the same reasoning applies — local optimization beats local enforcement only because the global property is tracked globally.
The eval discipline that catches composition bugs
Most agent eval suites grade single-call competence: did the agent pick the right tool, did it pass valid arguments, did it parse the response. None of that exercises composition.
Composition evals are red-team plans, not red-team prompts. The fixture is an adversarial trajectory the agent should refuse to complete, expressed at the level of a sequence of tool calls. Examples: a customer support transcript with an embedded injection that asks the agent to fetch and email account history; a coding task whose context includes a poisoned README that instructs the agent to copy files into a public repository; a research task with a poisoned search result that instructs the agent to issue a credential-bearing request to an attacker-controlled URL. A correct eval scores the agent on whether it abandoned the trajectory before reaching the dangerous sink, not on whether each individual call was well-formed.
The eval set has to be maintained alongside the tool catalog. Every new tool registration adds rows because every new tool potentially completes a previously-unreachable dangerous composition. The eval cost grows with the catalog size for the same reason the threat surface does, and budgeting for it explicitly is the only way the catalog can grow without the implicit threat surface growing faster than the team's ability to review it.
The org failure mode this points at
The deeper question this problem raises is who owns the composition graph. In most organizations, the answer is nobody. The platform team owns the agent runtime. The product teams own individual tools and write their own threat models. The security team reviews each tool registration. None of these roles owns the graph that the registrations collectively produce, and none owns the eval set that exercises the graph.
Naming an explicit owner for the integrated tool surface — not the tools, the surface — is the organizational change that has to land before the technical changes can stick. The owner's job is to enumerate the dangerous compositions, write the guards, maintain the taint labels, run the composition evals, and hold the authority to refuse a tool registration that completes a dangerous path. Without that owner, every individual tool registration is rational, every individual review passes, and the surface keeps accumulating exploit paths until an incident exposes the gap.
The pattern that finally lands is a tool-registration review that asks two questions instead of one. The old question — "is this tool safe?" — still applies. The new question — "what compositions does this tool newly enable, and which of them are dangerous?" — is the one that catches the next class of incidents. The first time you ask it of an existing catalog, you will discover that the answer is uncomfortable. That discomfort is the cost of having had a generator and a verdict on a node when you needed verdicts on edges.
The agents people are deploying right now do not have this review. They have credentialed access to data, authorized egress channels, planners that read untrusted strings as part of normal operation, and catalogs that grow with each product cycle. Each of those things alone is fine. The composition is the bug. The fix is to start treating the graph as the artifact, the per-path threat model as the work, and the planner's trajectory — not its individual calls — as the surface that has to be defensible.
- https://embracethered.com/blog/posts/2025/cross-agent-privilege-escalation-agents-that-free-each-other/
- https://www.geordie.ai/resources/the-new-attack-surface-why-ai-agents-need-taint-analysis
- https://aquilax.ai/blog/agentic-ai-privilege-escalation
- https://genai.owasp.org/llmrisk/llm01-prompt-injection/
- https://cheatsheetseries.owasp.org/cheatsheets/AI_Agent_Security_Cheat_Sheet.html
- https://christian-schneider.net/blog/securing-mcp-defense-first-architecture/
- https://labs.cloudsecurityalliance.org/research/csa-research-note-ai-agent-confused-deputy-prompt-injection/
- https://arxiv.org/pdf/2505.23643
- https://www.trendmicro.com/vinfo/us/security/news/threat-landscape/unveiling-ai-agent-vulnerabilities-part-iii-data-exfiltration
