Prompt Injection Surface Area Mapping: Find Every Attack Vector Before Attackers Do
Most teams discover their prompt injection surface area the wrong way: a security researcher posts a demo, a customer reports strange behavior, or an incident post-mortem reveals a tool call that should never have fired. By then the attack path is already documented and the blast radius is real.
Prompt injection is the OWASP #1 risk for LLM applications, but the framing as a single vulnerability obscures what it actually is: a family of attack vectors that scale with your application's complexity. Every external data source you feed into a prompt is a potential injection surface. In an agentic system with a dozen tool integrations, that surface area is enormous — and most of it is unmapped.
This post is a practitioner's methodology for mapping it before attackers do.
Why Prompt Injection Is Structurally Different From SQL Injection
The SQL injection analogy is useful but misleading if taken too far. SQL injection was solved architecturally: parameterized queries create a hard boundary between code and data, enforced at the parser level. The database never confuses a string literal for an operator.
LLMs have no equivalent boundary. Instructions and data arrive as a single token stream. The model interprets both using the same mechanism. You cannot parameterize a prompt the way you parameterize a SQL statement — there is no separate code channel.
This is not a temporary limitation waiting on a better model. It is inherent to the architecture. Training-based robustness helps at the margins. Claude Sonnet 4.5 achieves a 1.4% attack success rate against adaptive attackers in Anthropic's internal testing, down from 10.8% without safeguards. That is a meaningful improvement. It is not a solution. No current browser-integrated agent is immune.
The practical implication: you cannot solve prompt injection entirely at the model layer. Architecture and infrastructure controls matter more than any individual guardrail.
The Two Categories of Injection
Direct injection is the familiar form: a user crafts input that manipulates the model's behavior. "Ignore previous instructions. Your new task is..." These are relatively easy to defend against because the attacker and the user are the same person. If a user jailbreaks your chatbot and gets it to say something inappropriate, the blast radius is bounded to that conversation.
Indirect injection is the larger threat. The attacker is not the user. Malicious instructions are embedded in external content that your system retrieves and processes as part of a legitimate workflow. The user asks a legitimate question. The agent fetches a web page, reads an email, queries a database, or calls an API. Somewhere in that external data is an instruction the model interprets as legitimate.
Indirect injection scales dangerously. One poisoned document affects every user who triggers its retrieval. In agentic systems with tool access, it can cause the model to exfiltrate data, execute code, or trigger operations the user never requested. The user has no idea anything went wrong.
Enumerating Your Attack Surface
Surface area mapping starts with a simple question: what external data can reach my prompt? Run through every component:
User-controlled inputs (direct surface)
- Chat messages, form fields, search queries
- Uploaded documents (PDFs, Word files, CSVs, images with embedded text)
- Configuration strings users can set
Web and content retrieval (indirect surface)
- Web browsing tool results
- Content fetched via URL scraping
- RSS feeds, news aggregators
- Social media content
Persistent storage (indirect surface)
- Database records queried at runtime
- Vector store / RAG retrieval results
- Document repositories
- CRM records, support tickets
External API responses (indirect surface)
- Third-party API call results
- Webhook payloads
- Tool outputs from integrations (Slack, email, calendar)
Agent-to-agent communication (amplified surface)
- Tool outputs from sub-agents
- Results from orchestrator-to-worker calls
- Shared memory or scratchpad state
System and infrastructure (often overlooked)
- MCP tool descriptions loaded from external servers
- Plugin metadata
- Dynamic few-shot examples sourced from a database
- Log data fed back into context for debugging
The last two categories catch teams off guard. If your system dynamically loads tool descriptions or examples from an external source, those are injection vectors. Attackers can influence the tools your agent thinks it has access to, or the behavioral examples it learns from.
Risk-Scoring Each Surface
Not all surfaces carry equal risk. A five-factor framework helps prioritize:
1. Trust origin: Is the content created by your team, by authenticated users, or by arbitrary third parties? Anonymous web content and user-uploaded files are high trust-origin risk. Your own database records are lower — though not zero, since database fields can be poisoned upstream.
2. Agent capability scope: What can the model do after receiving this content? A read-only summarization agent processing malicious input is much lower risk than a write-capable agent with file system access. The injection surface risk multiplies with capability scope.
3. User interaction requirement: Zero-click surfaces (content retrieved automatically, RAG results) are higher risk than surfaces that require user initiation. An email summarization pipeline that auto-processes incoming messages is a zero-click surface.
4. Downstream agent exposure: Does injected content feed into other agents? Multi-agent systems create infection pathways. A compromised sub-agent can inject malicious content into the orchestrator's context, which can then spread to sibling agents. This is the "prompt infection" pattern — contagious attacks that propagate through multi-agent pipelines.
5. Remediation reversibility: Can the model trigger irreversible actions from this surface? Sending emails, deleting records, making financial transactions — surfaces that connect to irreversible actions require the most scrutiny.
Map each surface against these five factors. Surfaces that score high on capability scope AND downstream exposure AND user interaction are your highest-priority remediation targets.
Sanitization Patterns by Surface Type
Different surfaces require different defenses. There is no single sanitization approach that works uniformly.
Structural Prompt Separation
For any external content entering your prompt, use explicit structural boundaries:
- https://genai.owasp.org/llmrisk/llm01-prompt-injection/
- https://cheatsheetseries.owasp.org/cheatsheets/LLM_Prompt_Injection_Prevention_Cheat_Sheet.html
- https://www.lakera.ai/blog/indirect-prompt-injection
- https://www.crowdstrike.com/en-us/blog/indirect-prompt-injection-attacks-hidden-ai-risks/
- https://unit42.paloaltonetworks.com/model-context-protocol-attack-vectors/
- https://blogs.cisco.com/ai/prompt-injection-is-the-new-sql-injection-and-guardrails-arent-enough/
- https://arxiv.org/abs/2306.05499
- https://simonwillison.net/2025/Nov/2/new-prompt-injection-papers/
- https://openreview.net/forum?id=NAbqM2cMjD
- https://www.anthropic.com/research/prompt-injection-defenses
