The Three Attack Surfaces in Multi-Agent Communication
A recent study tested 17 frontier LLMs in multi-agent configurations and found that 82% of them would execute malicious commands when those commands arrived from a peer agent — even though the exact same commands were refused when issued directly by a user. That number should reset your threat model if you're shipping multi-agent systems. Your agents may be individually hardened. Together, they're not.
Multi-agent architectures introduce communication channels that most security thinking ignores. We harden the model, the system prompt, the API perimeter. We spend almost no time on what happens when Agent A sends a message to Agent B — who wrote that message, whether it was tampered with, whether the memory Agent B consulted was planted three sessions ago by an attacker who never touched Agent A at all.
There are three distinct attack surfaces in multi-agent communication. They operate differently, require different defenses, and fail in different ways. Getting fluent in all three is prerequisite work for anyone shipping agentic pipelines into production.
Attack Surface One: Prompt Injection Across Agent Boundaries
Prompt injection is well understood in the single-agent case: an attacker embeds instructions in content the agent processes (a document, a web page, a retrieved chunk), and those instructions override or augment the system prompt. In multi-agent systems the same attack becomes structurally harder to contain because agents pass output to other agents as trusted input.
Consider a pipeline where a research agent retrieves documents, summarizes them, and passes the summary to a writing agent. The research agent's output is treated by the writing agent as a task specification, not as untrusted external content. An attacker who can influence what the research agent retrieves — by poisoning a web page, a database record, or a search result — can inject instructions that propagate directly into the writing agent's context with no indication that they originated outside the system.
The AgentDojo benchmark, which evaluates agents against 629 injection test cases across 97 tasks, found that injections embedded at the end of tool responses achieve up to a 70% success rate against GPT-4o-based agents. In multi-step pipelines the success rate compounds: a partially successful injection in step two becomes input to step three, which may treat it as ground truth.
The architectural problem is that agents don't have a native concept of message provenance. An agent receiving a message from a peer has no out-of-the-box way to distinguish "this came from our orchestrator's trusted planning module" from "this message passed through a retrieval step that touched attacker-controlled content." The trust level defaults to whatever role the message occupies in the context — and orchestrator messages typically receive high trust.
The Devin AI vulnerability demonstrated this concretely: researchers demonstrated that the asynchronous coding agent could be manipulated through injected repository content to expose internal ports, leak access tokens, and install command-and-control malware. The injection vector was content the agent was supposed to read, not instructions it was supposed to follow. The agent made no distinction.
What this requires at the protocol level: Agents receiving output from other agents should treat that output as tainted unless it passes validation. This means tracking whether content originated from an external retrieval step versus direct orchestrator instruction, and applying different trust levels accordingly. Output from any agent that processed external data should be tagged with a provenance annotation that downstream agents can inspect before acting.
Attack Surface Two: Agent Spoofing and Impersonation
The second attack surface is identity. In most deployed multi-agent systems, agents identify each other by position in a conversation, by a name string in a message header, or by an API key that never rotates. None of these provide meaningful authentication.
Agent spoofing works by impersonating a trusted agent in the pipeline. An attacker who can inject a message into a communication channel only needs to mimic the format of a trusted peer to bypass most access controls. Studies testing multi-agent architectures report 40–70% success rates for agent-in-the-middle attacks where the attacker intercepts messages or synthesizes new ones that appear to come from legitimate agents.
The attack is particularly effective because multi-agent orchestration frameworks are built for usability, not for adversarial environments. A compromised agent that injects spoofed messages claiming to be the orchestrator will typically receive orchestrator-level trust. A single compromised node can therefore instruct peers to take privileged actions it couldn't take directly.
The more sophisticated variant is "agent session smuggling" — where a malicious remote agent injects hidden instructions between legitimate requests and responses during a handshake sequence. Unit 42 documented proof-of-concept attacks demonstrating unauthorized stock trades and information exfiltration using this technique. The barrier is low: an attacker needs to convince a victim agent to connect to a malicious peer once.
Agent supply chain compromise extends this further. In enterprise multi-agent deployments where agents are registered in a discovery registry, an attacker who can modify registry metadata can redirect legitimate agents to spoofed endpoints. Every agent that trusts the registry will then communicate with the attacker's infrastructure while believing it's talking to authorized internal systems.
What this requires at the protocol level: Agents should present cryptographically signed identity credentials — what some frameworks call AgentCards — before any substantive exchange. The receiving agent verifies the signature against the issuer's public key or a registry. The signed card should include declared capabilities and scope, so an agent claiming to be the financial reporting module can be verified as only having authority over finance operations. Message-level HMAC signatures that prevent tampering in transit are a separate but complementary requirement.
Attack Surface Three: Memory Poisoning
The third attack surface is the most underappreciated because it's asynchronous and persistent. Both of the previous attacks operate within a session. Memory poisoning operates across sessions, inserting malicious content into persistent stores that agents retrieve and act on long after the attacker has left.
Most production agent systems maintain some form of persistent memory: vector stores for semantic retrieval, episodic memory logs for conversational context, knowledge bases agents update as they work. These stores are written to by the agents themselves. An attacker who can influence what an agent writes — through a successful injection in a previous session, through crafted user inputs that the agent summarizes into memory, or through direct database access — creates a standing backdoor into every future session.
The MINJA attack (published at NeurIPS 2025) demonstrated that memory stores can be poisoned through query-only interaction, without direct write access to the database. By crafting queries that cause an agent to store malicious synthetic memories as part of normal operation, attackers can plant instructions that surface on future retrievals. Unlike a prompt injection that affects a single response, a poisoned memory record is retrieved and acted upon in every session that triggers the retrieval condition.
Episodic memory is particularly vulnerable because agents are designed to trust their own prior state. When an agent retrieves a memory record, the implicit assumption is "this is what I knew before." There's no native mechanism to ask "was this record written under adversarial conditions?"
Cross-agent memory sharing multiplies the blast radius. In architectures where a shared vector store is written to by a research agent and read by a reasoning agent, a single poisoning event in the research agent's output can corrupt the reasoning agent's behavior indefinitely. The reasoning agent has no visibility into the research agent's write history.
What this requires at the protocol level: Memory stores need write provenance tagging — every record should carry metadata indicating which agent wrote it, under which session context, and whether that session processed external input. Reads should trigger provenance checks: memory written during sessions that touched external content should be treated as lower-trust than memory written from internal orchestrator instructions. Architectural isolation is the stronger version: agents that process external retrieval should write to separate memory partitions than agents that hold internal organizational state, with explicit promotion gates between them.
Why Securing Individual Agents Isn't Enough
The 82% peer-agent compliance finding isn't a model alignment failure. It's a protocol design failure. Models that correctly refuse user requests comply with the same requests when they arrive from peer agents because the trust model for peer communication is simply not the same as the trust model for user input. The model is behaving consistently with how the system was architected.
This is the core problem with perimeter-only security in multi-agent systems. Hardening each agent's system prompt against direct user attacks does nothing for messages arriving from compromised peers. Securing the API ingress doesn't help when the attack vector is a poisoned memory record written three sessions ago. Single-agent defenses compose poorly in multi-agent environments because the attack surfaces are different in kind, not just in scale.
The OWASP Top 10 for Agentic Applications (released December 2025) ranks insecure inter-agent communication and cascading failures in the top ten risks, ahead of most vulnerabilities that traditional security frameworks prioritize. Non-deterministic emergent failures — where multiple agents interacting with shared resources create failure modes that don't exist in any single agent — are harder to detect than traditional software failures because they don't produce consistent error signals. An agent behaving differently because of a poisoned memory record three layers upstream looks like model drift, not like an attack.
Building Communication Security Into the Pipeline
The defense-in-depth approach to multi-agent communication has three non-negotiable layers.
Cryptographic message authentication handles spoofing. Every inter-agent message should be signed with the sender's private key. The receiver verifies the signature before processing. This is standard in distributed systems security and trivially implementable via HMAC or asymmetric keys. The missing piece in most frameworks is enforcement: signing needs to be mandatory at the transport layer, not optional at the application layer.
Taint propagation and provenance tracking handles injection. Messages and memory records should carry annotations indicating whether they were derived from external content. Agents receiving tainted messages apply stricter validation — checking whether instructions in the message are within the sending agent's declared scope, whether they request actions inconsistent with the stated task, whether the instruction pattern matches the communication style of legitimate peers (behavioral fingerprinting for unusual command sequences).
Memory namespace isolation handles poisoning. Persistent stores written to by agents that process external retrieval should be isolated from stores written by internal orchestration agents. Promotion between namespaces — moving a research finding from "externally derived" to "trusted internal knowledge" — should require explicit authorization and ideally a human review gate for high-stakes decisions.
These are not exotic requirements. They're the same principles that secure inter-service communication in backend systems: authenticate callers, validate inputs even from trusted sources, don't let one compromised service write freely to shared state. The reason they're not standard in multi-agent frameworks yet is that most frameworks were designed for capability first and security second. That order needs to reverse before autonomous agent systems handle sensitive operations at scale.
The 48% of security professionals who identify agentic AI as the top attack vector for 2026 are responding to exactly this gap. The vulnerability isn't in any single model. It's in the implicit trust that emerges when capable models start talking to each other without the authentication and isolation that inter-service communication has required for decades.
- https://www.mdpi.com/2078-2489/17/1/54
- https://genai.owasp.org/llmrisk/llm01-prompt-injection/
- https://arxiv.org/html/2506.23260v1
- https://allabouttesting.org/owasp-agentic-ai-threat-t9-identity-spoofing-impersonation-in-ai-systems/
- https://galileo.ai/blog/multi-agent-systems-exploits
- https://unit42.paloaltonetworks.com/agent-session-smuggling-in-agent2agent-systems/
- https://arxiv.org/html/2511.03841v1
- https://arxiv.org/html/2603.20357v1
- https://medium.com/@michael.hannecke/agent-memory-poisoning-the-attack-that-waits-9400f806fbd7
- https://www.lakera.ai/blog/agentic-ai-threats-p1
- https://unit42.paloaltonetworks.com/indirect-prompt-injection-poisons-ai-longterm-memory/
- https://agentmessaging.org/
- https://medium.com/@adnanmasood/security-in-agentic-communication-threats-controls-standards-and-implementation-patterns-for-bf1eadc94e95
- https://opensource.microsoft.com/blog/2026/04/02/introducing-the-agent-governance-toolkit-open-source-runtime-security-for-ai-agents/
- https://adversa.ai/blog/cascading-failures-in-agentic-ai-complete-owasp-asi08-security-guide-2026/
- https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/
- https://arxiv.org/html/2511.15759v1
- https://www.redhat.com/en/blog/model-context-protocol-mcp-understanding-security-risks-and-controls
- https://www.pillar.security/blog/the-security-risks-of-model-context-protocol-mcp/
- https://arxiv.org/html/2603.22489
