Skip to main content

5 posts tagged with "agent-security"

View all tags

Tool Composition Sandbox Escape: When Three Safe Tools Compose Into Data Exfiltration

· 10 min read
Tian Pan
Software Engineer

The security review approved each of the three tools individually. Read-only access to the customer database was rated low risk because the agent could see records but not modify them. Send-email-to-self was rated low risk because the recipient was hardcoded to a service-account inbox the agent was already authorized to write to. Template-render was rated low risk because it was a deterministic Jinja-style transform with no I/O. Three weeks after launch, the data-loss-prevention dashboard flagged customer PII showing up in a Slack channel that two hundred employees could read, and the post-mortem traced the leak to the agent composing the three tools into a chain that no single ACL had granted: read a customer record, render it through a template, email the result to its own service account whose inbox auto-forwarded into the channel.

No single tool was misused. No prompt injection bypassed any check. The agent did exactly what its tool catalog said it could do, and the composition produced a capability the security review had never been asked to evaluate.

The Prompt Surface Area Problem: Why Adding a Tool Is Never Just Adding a Tool

· 10 min read
Tian Pan
Software Engineer

Every engineer who has shipped an LLM-powered agent has been tempted by a simple mental model: a tool is a function. Adding a tool means the agent can do one more thing. The cost is a few lines of documentation in the system prompt, maybe a schema definition, maybe one new entry in a tool registry. It feels additive — linear.

It isn't. Each new tool doesn't expand what the agent can do in isolation; it expands what the agent can do in combination with every tool already there. That distinction is the source of a class of production failures that no amount of prompt tweaking can fix after the fact, because the problem is architectural. The prompt surface area problem is real, it compounds quickly, and most teams don't see it until they're already deep in it.

Agent Blast Radius: Bounding Worst-Case Impact Before Your Agent Misfires in Production

· 10 min read
Tian Pan
Software Engineer

Nine seconds. That's how long it took a Cursor AI agent to delete an entire production database, including all volume-level backups, while attempting to fix a credential mismatch. The agent had deletion permissions it never needed for any legitimate task. The blast radius was total because nobody had bounded it before deployment.

This isn't a story about model failure. It's a story about permission scope. The model did exactly what it calculated it should do. The engineering team just never asked: what's the worst this agent could do if it reasons incorrectly?

That question — answered systematically before deployment — is blast radius analysis.

Agent Memory Poisoning: The Attack That Persists Across Sessions

· 11 min read
Tian Pan
Software Engineer

Prompt injection gets all the attention. But prompt injection ends when the session closes. Memory poisoning — injecting malicious instructions into an agent's long-term memory — creates a persistent compromise that survives across sessions and executes days or weeks later, triggered by interactions that look nothing like an attack. Research on production agent systems shows over 95% injection success rates and 70%+ attack success rates across tested LLM-based agents. This is the attack vector most teams aren't defending against, and it's already in the OWASP Top 10 for Agentic Applications.

The core problem is simple: agents treat their own memories as trustworthy. When an agent retrieves a "memory" from its vector store or conversation history, it processes that information with the same confidence as its system instructions. There's no cryptographic signature, no provenance chain, no mechanism for the agent to distinguish between a memory it formed from genuine interaction and one injected by a malicious document it processed last Tuesday.

Agent Sandboxing and Secure Code Execution: Matching Isolation Depth to Risk

· 11 min read
Tian Pan
Software Engineer

Most teams shipping LLM agents with code execution capabilities make the same miscalculation: they treat sandboxing as a binary property. Either they skip isolation entirely ("we trust our users") or they deploy Docker containers and consider the problem solved. Neither position survives contact with production.

The reality is that sandboxing exists on a spectrum with five distinct levels, each offering a different isolation guarantee, performance profile, and operational cost. The mismatch between chosen isolation level and actual risk profile is the root cause of most agent security incidents — not the absence of any sandbox at all.