Prompt Injection Is a Confused Deputy, Not a Content-Filtering Problem
The most common post-incident finding for a prompt injection breach is some variation of "the model got tricked." A retrieved document contained hidden instructions, the agent followed them, customer data left the building. The fix that follows is almost always a content filter: scan the input, classify the malicious instruction, strip it out before it reaches the model. Ship the filter, close the ticket.
That finding is wrong, and the filter is a treadmill. "The model got tricked" describes the symptom, not the vulnerability. The vulnerability is that an agent holding real privileges — a database token, a send-email capability, filesystem write — accepted instructions from a source that should never have been allowed to command those privileges. That is not a new class of bug. It is a confused deputy, and operating systems named and largely solved it almost forty years ago.
If you treat prompt injection as a detection problem, you are signing up for an arms race against every attacker who can phrase a sentence. If you treat it as an authority problem, you get to reuse decades of security engineering that already works.
