Skip to main content

The Lethal Trifecta: Why Your AI Agent Is One Email Away from a Data Breach

· 9 min read
Tian Pan
Software Engineer

In June 2025, a researcher sent a carefully crafted email to a Microsoft 365 Copilot user. No link was clicked. No attachment opened. The email arrived, Copilot read it during a routine summarization task, and within seconds the AI began exfiltrating files from OneDrive, SharePoint, and Teams — silently transmitting contents to an attacker-controlled server by encoding data into image URLs it asked to "render." The victim never knew it happened.

This wasn't a novel zero-day in the traditional sense. There was no buffer overflow, no SQL injection. The vulnerability was architectural: the system combined three capabilities that, individually, seem like obvious product features. Together, they form what's now called the Lethal Trifecta.

The Three Capabilities That Kill

The Lethal Trifecta describes a specific combination of agent capabilities that creates a catastrophic attack surface:

  1. Access to private data — the agent can retrieve sensitive information: credentials, source code, financial records, user data, internal documents
  2. Exposure to untrusted content — the agent reads or processes content from the outside world: emails, web pages, documents uploaded by users, GitHub issues, support tickets
  3. Ability to communicate externally — the agent can send data out: via HTTP requests, email, Slack messages, API calls, or even image URL fetches

Strip away any one of these three and the attack fails. Keep all three and you've handed attackers a key to your most sensitive systems — one that doesn't require them to write a single line of exploit code.

The EchoLeak attack (CVE-2025-32711, CVSS 9.3) is the clearest demonstration of this in production. Microsoft's Copilot had all three legs: it could access organizational data across M365 services, it consumed untrusted email content as part of its job, and it could trigger external requests via image rendering. The attack bypassed Microsoft's cross-prompt injection classifier, Content Security Policy, and external link redaction. It was patched quietly in May 2025 — after a working proof-of-concept had existed for months.

Why LLMs Cannot Tell Instructions from Data

To understand why this is so hard to fix, you need to understand the fundamental architectural problem: LLMs have no privileged instruction channel.

When a human reads a document, they know the difference between the author's words and their manager's orders. When an LLM processes text, everything — system prompt, user message, tool outputs, document contents, web page text — gets concatenated into a single token sequence and fed through the same attention mechanism. There's no cryptographic signature on "legitimate instructions." There's no semantic wall between "this is data" and "this is a command."

An attacker who can get text into the model's context window can, with the right wording, make that text look indistinguishable from a legitimate user instruction. Prompt injection exploits this directly: hide the payload inside content the agent was legitimately asked to process, and the agent may follow it.

This isn't a bug that will be fixed in the next model release. It's a property of how transformers work. The model interprets tokens. A token saying "ignore previous instructions" carries no less weight than one in your system prompt — unless you've built architectural barriers that prevent that token from ever reaching the model in the first place.

What the Attack Chain Looks Like

Understanding the sequence makes it concrete. Here's a typical Lethal Trifecta exploit:

Stage 1: Delivery. An attacker crafts content containing hidden instructions — in a Word document, an email body, a GitHub issue, a webpage the agent is asked to summarize. The instructions might be white text on white background, embedded in metadata, or simply trusted by the model because they're plausibly formatted.

Stage 2: Ingestion. The AI agent, doing its job, processes this content. The malicious instructions enter the context window alongside legitimate content.

Stage 3: Execution. The model, unable to distinguish legitimate commands from injected ones, begins following the attacker's instructions. It accesses data it's authorized to read, then formats that data for exfiltration — perhaps as a URL parameter, a crafted search query, or content in a message to send.

Stage 4: Exfiltration. The agent uses its legitimate communication channels to transmit the data. EchoLeak used image URL fetches. Other attacks have used email drafts, search queries, or webhook calls. The data leaves through a door the system was designed to have open.

No code vulnerability was exploited. The agent did exactly what it was authorized to do — just with different intent.

Real Systems That Have Been Hit

The pattern has appeared in production systems across the industry:

Microsoft 365 Copilot (EchoLeak): Zero-click attack via crafted emails. The agent exfiltrated SharePoint files, Teams messages, and chat logs through hidden image requests. Patched May 2025 after private disclosure in January.

GitHub's MCP server: Researchers demonstrated that malicious content in repository issues could redirect an MCP-connected agent to access and leak code or secrets from private repositories it had access to.

GitLab Duo: GitLab's AI assistant was shown to be susceptible to instructions embedded in merge request descriptions and issue comments — content the assistant regularly processes.

Slack AI: Injected instructions in public channel messages could influence the behavior of Slack's AI summarization features when processing that content for users with broader access.

The common thread: each system was designed to be helpful. Being helpful required reading external content. Reading external content created the attack surface.

Breaking the Trifecta: Architectural Defenses

Security teams often default to "add a guardrail" — a classifier that detects prompt injection attempts in incoming content. This approach doesn't work well in practice. Injection payloads can be obfuscated, split across chunks, or embedded in formats the classifier doesn't inspect. Classifiers add latency and false positives. And they operate on the content before it reaches the model, not on the model's behavior after.

The more reliable approach is to eliminate at least one leg of the trifecta through system design.

Minimize data access scope. Apply least-privilege aggressively. An agent tasked with summarizing emails doesn't need read access to source code repositories. An agent that helps with customer support tickets doesn't need credentials to your internal admin tools. Every piece of data access you grant is surface area. Grant only what the specific task requires, not what might be useful someday.

Isolate untrusted content processing. The Plan-Then-Execute pattern runs a planning phase — where the agent decides what to do based only on the user's request — before any untrusted content is introduced. The plan is locked. Then a separate execution phase carries out the plan using untrusted content but without the ability to change what actions are taken. The injected content can't redirect the plan because the plan is already finalized.

Sever external communication from content-processing stages. If your agent must read untrusted documents and must have access to sensitive data, at minimum prevent it from making outbound requests during that phase. A read-only analysis stage followed by a human-review stage followed by a write/send stage breaks the chain even when the first two legs are present.

The Dual LLM pattern goes further: route untrusted content through a sandboxed model that has no access to private data or communication capabilities. Only sanitized, structured outputs from this model flow to the privileged agent. The privileged agent never sees raw untrusted content.

Containerize and network-restrict agent processes. Run agent code in containers with explicit network allow-lists. If the agent should only communicate with your API and one third-party service, block everything else at the network layer. An injected instruction to exfiltrate data to an attacker's server fails if the container can't reach that server.

The Human Oversight Layer

None of the architectural patterns above are foolproof. They're defense in depth, not guarantees. For high-stakes operations — anything involving production data writes, email sends, financial transactions, or access to sensitive credentials — human review before execution remains essential.

This means designing agents with an explicit human-in-the-loop checkpoint for consequential actions, not as an afterthought but as a first-class architectural component. The agent presents a plan and required data accesses; a human approves before any action is taken. This dramatically limits the blast radius of a successful injection: the attacker can control what the agent proposes but not what gets approved.

Smaller, more frequent checkpoints also limit the damage from any single compromised step. An agent that takes 10 small, reviewed steps is harder to exploit than one that autonomously executes a 50-step plan end to end.

What This Means for Teams Building Agents Today

Most teams building AI agents today are focused on capability: what can the agent do, how accurate is it, how fast does it respond. Security is an afterthought, something to address before launch.

That ordering is backwards. The Lethal Trifecta shows that capability is the attack surface. Every new integration you add, every data source you connect, every communication channel you enable is a potential leg of the trifecta. These decisions need security review at design time, not after an EchoLeak-style incident forces the issue.

The checklist when designing any agent feature is simple but often skipped:

  • Does this agent access sensitive data?
  • Does this agent process content from outside the trust boundary?
  • Does this agent have outbound communication capabilities?

If you checked all three, stop. Redesign before building. The feature may still be possible — but it requires explicit architectural mitigations, not "we'll add guardrails later."

The most dangerous assumption in agentic AI right now is that helpful features are safe by default. They're not. An agent that reads your email, accesses your files, and can send messages on your behalf is an extraordinarily powerful system — and exactly the kind of system attackers are learning to exploit, one carefully crafted document at a time.

References:Let's stay in touch and Follow me for more thoughts and updates