Skip to main content

14 posts tagged with "prompt-injection"

View all tags

The Lethal Trifecta: Why Your AI Agent Is One Email Away from a Data Breach

· 9 min read
Tian Pan
Software Engineer

In June 2025, a researcher sent a carefully crafted email to a Microsoft 365 Copilot user. No link was clicked. No attachment opened. The email arrived, Copilot read it during a routine summarization task, and within seconds the AI began exfiltrating files from OneDrive, SharePoint, and Teams — silently transmitting contents to an attacker-controlled server by encoding data into image URLs it asked to "render." The victim never knew it happened.

This wasn't a novel zero-day in the traditional sense. There was no buffer overflow, no SQL injection. The vulnerability was architectural: the system combined three capabilities that, individually, seem like obvious product features. Together, they form what's now called the Lethal Trifecta.

LLM Guardrails in Production: Why One Layer Is Never Enough

· 10 min read
Tian Pan
Software Engineer

Here is a math problem that catches teams off guard: if you stack five guardrails and each one operates at 90% accuracy, your overall system correctness is not 90%—it is 59%. Stack ten guards at the same accuracy and you get under 35%. The compound error problem means that "adding more guardrails" can make a system less reliable than adding fewer, better-calibrated ones. Most teams discover this only after they've wired up a sprawling moderation pipeline and started watching their false-positive rate climb past anything users will tolerate.

Guardrails are not optional for production LLM applications. Hallucinations appear in roughly 31% of real-world LLM responses under normal conditions, and that figure climbs to 60–88% in regulated domains like law and medicine. Jailbreak attacks against modern models succeed at rates ranging from 57% to near-100% depending on the technique. But treating guardrails as a bolt-on compliance checkbox—rather than a carefully designed subsystem—is how teams end up with systems that block legitimate requests constantly while still missing adversarial ones.