The Three Attack Surfaces in Multi-Agent Communication

April 10, 2026 · 10 min read

Software Engineer

A recent study tested 17 frontier LLMs in multi-agent configurations and found that 82% of them would execute malicious commands when those commands arrived from a peer agent — even though the exact same commands were refused when issued directly by a user. That number should reset your threat model if you're shipping multi-agent systems. Your agents may be individually hardened. Together, they're not.

Multi-agent architectures introduce communication channels that most security thinking ignores. We harden the model, the system prompt, the API perimeter. We spend almost no time on what happens when Agent A sends a message to Agent B — who wrote that message, whether it was tampered with, whether the memory Agent B consulted was planted three sessions ago by an attacker who never touched Agent A at all.

There are three distinct attack surfaces in multi-agent communication. They operate differently, require different defenses, and fail in different ways. Getting fluent in all three is prerequisite work for anyone shipping agentic pipelines into production.

Attack Surface One: Prompt Injection Across Agent Boundaries

Prompt injection is well understood in the single-agent case: an attacker embeds instructions in content the agent processes (a document, a web page, a retrieved chunk), and those instructions override or augment the system prompt. In multi-agent systems the same attack becomes structurally harder to contain because agents pass output to other agents as trusted input.

Consider a pipeline where a research agent retrieves documents, summarizes them, and passes the summary to a writing agent. The research agent's output is treated by the writing agent as a task specification, not as untrusted external content. An attacker who can influence what the research agent retrieves — by poisoning a web page, a database record, or a search result — can inject instructions that propagate directly into the writing agent's context with no indication that they originated outside the system.

The AgentDojo benchmark, which evaluates agents against 629 injection test cases across 97 tasks, found that injections embedded at the end of tool responses achieve up to a 70% success rate against GPT-4o-based agents. In multi-step pipelines the success rate compounds: a partially successful injection in step two becomes input to step three, which may treat it as ground truth.

The architectural problem is that agents don't have a native concept of message provenance. An agent receiving a message from a peer has no out-of-the-box way to distinguish "this came from our orchestrator's trusted planning module" from "this message passed through a retrieval step that touched attacker-controlled content." The trust level defaults to whatever role the message occupies in the context — and orchestrator messages typically receive high trust.

The Devin AI vulnerability demonstrated this concretely: researchers demonstrated that the asynchronous coding agent could be manipulated through injected repository content to expose internal ports, leak access tokens, and install command-and-control malware. The injection vector was content the agent was supposed to read, not instructions it was supposed to follow. The agent made no distinction.

What this requires at the protocol level: Agents receiving output from other agents should treat that output as tainted unless it passes validation. This means tracking whether content originated from an external retrieval step versus direct orchestrator instruction, and applying different trust levels accordingly. Output from any agent that processed external data should be tagged with a provenance annotation that downstream agents can inspect before acting.

Attack Surface Two: Agent Spoofing and Impersonation

The second attack surface is identity. In most deployed multi-agent systems, agents identify each other by position in a conversation, by a name string in a message header, or by an API key that never rotates. None of these provide meaningful authentication.

Agent spoofing works by impersonating a trusted agent in the pipeline. An attacker who can inject a message into a communication channel only needs to mimic the format of a trusted peer to bypass most access controls. Studies testing multi-agent architectures report 40–70% success rates for agent-in-the-middle attacks where the attacker intercepts messages or synthesizes new ones that appear to come from legitimate agents.

The attack is particularly effective because multi-agent orchestration frameworks are built for usability, not for adversarial environments. A compromised agent that injects spoofed messages claiming to be the orchestrator will typically receive orchestrator-level trust. A single compromised node can therefore instruct peers to take privileged actions it couldn't take directly.

The more sophisticated variant is "agent session smuggling" — where a malicious remote agent injects hidden instructions between legitimate requests and responses during a handshake sequence. Unit 42 documented proof-of-concept attacks demonstrating unauthorized stock trades and information exfiltration using this technique. The barrier is low: an attacker needs to convince a victim agent to connect to a malicious peer once.

Agent supply chain compromise extends this further. In enterprise multi-agent deployments where agents are registered in a discovery registry, an attacker who can modify registry metadata can redirect legitimate agents to spoofed endpoints. Every agent that trusts the registry will then communicate with the attacker's infrastructure while believing it's talking to authorized internal systems.

What this requires at the protocol level: Agents should present cryptographically signed identity credentials — what some frameworks call AgentCards — before any substantive exchange. The receiving agent verifies the signature against the issuer's public key or a registry. The signed card should include declared capabilities and scope, so an agent claiming to be the financial reporting module can be verified as only having authority over finance operations. Message-level HMAC signatures that prevent tampering in transit are a separate but complementary requirement.

Loading…

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

The Three Attack Surfaces in Multi-Agent Communication

Attack Surface One: Prompt Injection Across Agent Boundaries

Attack Surface Two: Agent Spoofing and Impersonation

Recommended Reading

About Tian Pan

Attack Surface One: Prompt Injection Across Agent Boundaries​

Attack Surface Two: Agent Spoofing and Impersonation​

Recommended Reading

About Tian Pan

Attack Surface One: Prompt Injection Across Agent Boundaries

Attack Surface Two: Agent Spoofing and Impersonation