Prompt Injection in Multimodal Inputs: The Visual Attack Surface Your Text-Only Defense Misses
When teams harden their AI pipelines against prompt injection, they usually focus on text: sanitizing user input strings, scanning outputs for exfiltrated data, filtering known jailbreak patterns. That work matters, but it addresses roughly half the attack surface of a modern AI system. The other half lives inside images, PDFs, audio clips, and charts — formats that bypass every text-scanning rule you've written, because the model processes them through entirely different pathways than it processes text.
Steganographic injection attacks against vision-language models achieve success rates around 24% across production models including GPT-4V, Claude, and LLaVA. That number isn't a lab artifact. It measures real attack payloads, hidden in ordinary-looking images, causing production models to deviate from their intended behavior. Your text injection scanner doesn't see any of it.
