Skip to main content

Tool Composition Sandbox Escape: When Three Safe Tools Compose Into Data Exfiltration

· 10 min read
Tian Pan
Software Engineer

The security review approved each of the three tools individually. Read-only access to the customer database was rated low risk because the agent could see records but not modify them. Send-email-to-self was rated low risk because the recipient was hardcoded to a service-account inbox the agent was already authorized to write to. Template-render was rated low risk because it was a deterministic Jinja-style transform with no I/O. Three weeks after launch, the data-loss-prevention dashboard flagged customer PII showing up in a Slack channel that two hundred employees could read, and the post-mortem traced the leak to the agent composing the three tools into a chain that no single ACL had granted: read a customer record, render it through a template, email the result to its own service account whose inbox auto-forwarded into the channel.

No single tool was misused. No prompt injection bypassed any check. The agent did exactly what its tool catalog said it could do, and the composition produced a capability the security review had never been asked to evaluate.

This is the form most agent permission failures will take in production, and it is structurally different from the LLM-security stories that dominate the conversation. Prompt injection is a vocabulary problem — an attacker smuggles instructions into a context window. Tool composition exfiltration is a grammar problem — a planner combines authorized verbs into a sentence whose meaning none of the verbs has on its own. Auditing tools individually catches the first class of bug. It does not catch the second.

The permission surface is the closure under composition, not the union of ACLs

Engineers reviewing an agent's tool catalog tend to picture the permission surface as a checklist: tool A reads X, tool B writes Y, tool C calls API Z. The surface looks like the union of those rows. The surface is actually the closure of those rows under composition — every chain the planner can construct from them, including chains the catalog's authors never enumerated.

Closure expands fast. With ten tools and a planner willing to call them in sequences of length five, the search space is in the hundreds of thousands. Most chains are nonsense. A small but nonzero fraction transmute the read-write-render-email primitives into a privilege escalation that the per-tool review process cannot see, because the per-tool review process is keyed on individual rows, not on graphs.

The recently disclosed EchoLeak vulnerability in Microsoft 365 Copilot (CVE-2025-32711, patched in May 2025) is the canonical instance. None of the underlying capabilities — RAG retrieval over user mailboxes, template-rendered responses, image rendering with external URLs — were individually exploitable. An attacker-crafted email caused the agent to retrieve sensitive context, embed it as query-string parameters in an image URL, and emit a response whose rendering caused the user's browser to GET that URL from an attacker-controlled host. Three innocuous capabilities, one zero-click data-exfiltration chain, no malware required.

Production teams will keep finding their own version of this. The chain is rarely as elegant as EchoLeak, and rarely involves a vulnerability researcher writing a CVE — usually it shows up as a DLP alert, a vendor invoice the agent paid by chaining "look up purchase order" with "approve up to threshold," or a customer support ticket where the bot quoted internal pricing into the public reply because two retrieval tools and a templating tool composed into "summarize using whatever you can find."

Origin and sink, not per-call permission

The defensive primitive that catches composition attacks is not a stricter per-tool ACL. It is a label that travels with data through the agent's reasoning, plus a policy engine that evaluates the proposed chain's data-flow graph before any of it executes.

Information-flow control (IFC) is the formal name for this idea, and the literature on it predates LLMs by decades. Microsoft Research's FIDES system, presented in the 2025 paper "Securing AI Agents with Information-Flow Control," is one of the first concrete planner designs that propagates confidentiality and integrity labels through tool outputs and uses them to gate downstream actions. The shape it suggests for production code is roughly:

  • Every tool output is tagged with origin labels — customer_pii, internal_only, external_untrusted — and labels propagate through any operation that consumes the value.
  • Every tool input declares the labels it will accept, and a sink check fires before invocation.
  • A composition policy declares which origin–sink combinations are forbidden, regardless of which tools produced or consumed them.

A rule like "any chain that reads customer_pii and produces an external_bound artifact is denied" catches the read-render-email exfil even when the planner reaches it through a path the security team never imagined. The reason it generalizes is that the rule is keyed on the data-flow graph, not on the tool catalog, and the graph stays small even when the catalog grows.

The rule does not have to be hand-authored. Static analysis tools have been doing taint propagation for decades — Sonar, Snyk, Apiiro, the LDRA toolchain — and the same source-to-sink reasoning that flags SQL injection in a Java codebase flags PII-to-external in an agent trace once you express tools as the source/sink primitives.

Red-team the combinatorics, not the vocabulary

A red-team exercise that fuzzes each tool in isolation is auditing a vocabulary. A red-team exercise that searches over the cross-product of tool sequences is auditing a language. The first answers "can any single tool be tricked into doing the wrong thing?" The second answers "is there any sentence the planner can construct that crosses a trust boundary the user did not authorize?"

The combinatorics matter. A handful of practical approaches scale better than enumerating every chain:

  • Capability-graph search. Construct the directed graph where edges connect tool outputs to compatible tool inputs. Run a reachability query: from any source labeled external_untrusted to any sink labeled external_bound. Any path is a candidate exploit chain. Prune by labels, by realistic prompt budgets, by what a model will actually call.
  • Prompt-shaped fuzzing. Generate adversarial prompts whose goal is to coax the planner into traversing a flagged path. Score success by whether the unwanted sink fired, not by whether the model emitted suspicious text. Promptfoo, DeepTeam, Microsoft's AI Red Teaming Agent, and Galileo's red-team libraries all support this shape of test.
  • Differential review on tool catalog changes. Every PR that adds or modifies a tool re-runs the reachability query. New chains light up as a diff in the policy report. The "next tool added six months later" failure mode — where a single new MCP server silently extends every existing tool into new dangerous compositions — becomes a CI signal instead of a post-mortem finding.

OWASP's MCP Top 10 and the OWASP Agentic AI Security Cheat Sheet now both treat tool-chain composition as a first-class category. The defensive techniques converging in those documents are runtime policy enforcement at the agent control plane (Microsoft's Securing MCP work calls this an "agent control plane"), egress filtering on tool outputs, and per-chain authorization rather than per-call.

The organizational failure mode

The technical defenses above are tractable. The organizational failure mode is harder.

Tools at scale are owned by different teams. Customer-DB-Read is owned by the data platform team, who reasoned about it in the context of "an analyst can read customer records." Send-Email-To-Self is owned by the platform team, who reasoned about it in the context of "an internal service needs to email itself state." Template-Render is owned by a third team who reasoned about it in the context of "a static-site generator turns Markdown into HTML." None of those teams is wrong about its tool. None of them owns the composition.

When a chain produces an exfil, every owning team can correctly say their tool is not the problem, and the composition policy — the thing that should have caught the chain — is owned by nobody. The same dynamic plays out in security review: each team's threat model is keyed on the team's tool, the security reviewer signs off on the tool, and the cross-tool reasoning that would have caught the chain happens in nobody's queue.

The organizational fix is to name an owner for the composition surface that is structurally separate from the tool authors. Production teams shipping agents at scale are converging on a control-plane shape: a policy service that the agent must consult before executing any tool chain, owned by a security-engineering function that has authority to deny chains the tool authors think are fine. The control plane sees what no individual tool author sees — the data-flow graph of every chain the planner attempts — and is the only place in the architecture where composition policy can land coherently.

The pattern echoes the way mature shops handle Kubernetes admission control or service-mesh policy. Individual workload authors don't get a vote on the policy that governs cross-cutting concerns; a centrally owned policy engine evaluates every workload against the rules. The agent equivalent is the same idea applied to tool execution.

What to build first

Most teams reading this are not going to ship a FIDES-style information-flow-control planner next quarter. The realistic order of operations is roughly:

  1. Inventory the tool catalog as a graph. List every tool, its input labels, its output labels. The first time a team does this, the labels are usually wrong — the act of writing them down forces the conversation that produces the actual labels.
  2. Add the obvious source-to-sink rules. PII to external. Internal-only to user-visible. Customer-derived to public-channel. These rules catch most production exfils because most production exfils are exactly these shapes.
  3. Wire the rules into the agent runtime as a deny list, not just a logging system. A policy that warns when violated is a policy that ships violated. The control plane has to be in-band, with authority to abort the chain before the sink fires.
  4. Add the rules to CI for tool-catalog changes. Every new tool is a new node in the graph; every PR re-evaluates reachability and reports any new edges that cross a forbidden source-to-sink pair.
  5. Then talk about formal IFC. Once the simple rules are in place and the team has experience with where they over- and under-approximate, the formal version becomes a refinement rather than a clean-sheet redesign.

The architectural realization worth carrying forward is that an agent's permission surface is the closure of its tool catalog under composition. Per-tool ACLs are auditing the catalog's vocabulary. The threat lives in the catalog's grammar, and the team that audits tools individually is reviewing words while the planner builds sentences.

References:Let's stay in touch and Follow me for more thoughts and updates