Skip to main content

101 posts tagged with "security"

View all tags

Building GDPR-Ready AI Agents: The Compliance Architecture Decisions That Actually Matter

· 10 min read
Tian Pan
Software Engineer

Most teams discover their AI agent has a GDPR problem the wrong way: a data subject files an erasure request, the legal team asks which systems hold that user's data, and the engineering team opens a ticket that turns into a six-month audit. The personal data is somewhere in conversation history, somewhere in the vector store, possibly cached in tool call outputs, maybe embedded in a fine-tuned checkpoint — and nobody mapped any of it.

This isn't a configuration gap. It's an architectural one. The decisions that determine whether your AI system is compliance-ready are made in the first few weeks of building, long before legal comes knocking. This post covers the four structural conflicts that regulated-industry engineers need to resolve before shipping AI agents to production.

The Hidden Scratchpad Problem: Why Output Monitoring Alone Can't Secure Production AI Agents

· 10 min read
Tian Pan
Software Engineer

When extended thinking models like o1 or Claude generate a response, they produce thousands of reasoning tokens internally before writing a single word of output. In some configurations those thinking tokens are never surfaced. Even when they are visible, recent research reveals a startling pattern: for inputs that touch on sensitive or ethically ambiguous topics, frontier models acknowledge the influence of those inputs in their visible reasoning only 25–41% of the time.

The rest of the time, the model does something else in its scratchpad—and then writes an output that doesn't reflect it.

This is the hidden scratchpad problem, and it changes the security calculus for every production agent system that relies on output-layer monitoring to enforce safety constraints.

MCP Server Supply Chain Risk: When Your Agent's Tools Become Attack Vectors

· 9 min read
Tian Pan
Software Engineer

In September 2025, an unofficial Postmark MCP server with 1,500 weekly downloads was quietly modified. The update added a single BCC field to its send_email function, silently copying every email to an attacker's address. Users who had auto-update enabled started leaking email content without any visible change in behavior. No error. No alert. The tool worked exactly as expected — it just also worked for someone else.

This is the new shape of supply chain attacks. Not compromised binaries or trojaned libraries, but poisoned tool definitions that AI agents trust implicitly. With over 12,000 public MCP servers indexed across registries and the protocol becoming the default integration layer for AI agents, the MCP ecosystem is recreating every mistake the npm ecosystem made — except the blast radius now includes your agent's ability to read files, send messages, and execute code on your behalf.

The Reasoning Trace Privacy Problem: How Chain-of-Thought Leaks Sensitive Data in Production

· 9 min read
Tian Pan
Software Engineer

Your reasoning model correctly identifies that a piece of data is sensitive 98% of the time. Yet it leaks that same data in its chain-of-thought 33% of the time. That gap — between knowing something is private and actually keeping it private — is the core of the reasoning trace privacy problem, and most production teams haven't built for it.

Extended thinking has become a standard tool for accuracy-hungry applications: customer support triage, medical coding assistance, legal document review, financial analysis. These are also exactly the domains where the data in the prompt is most sensitive. Deploying reasoning models in these contexts without understanding how traces handle that data is a significant exposure.

The Reasoning Trace Privacy Problem: What Your CoT Logs Are Leaking

· 8 min read
Tian Pan
Software Engineer

Most teams building on reasoning models treat privacy as a two-surface problem: sanitize the prompt going in, sanitize the response coming out. The reasoning trace in between gets logged wholesale for observability, surfaced to downstream systems for debugging, and sometimes passed back to users who asked to "see the thinking." That middle layer is where the real exposure lives — and most production deployments are not treating it like the liability it is.

Research from early 2026 quantified what practitioners have been observing anecdotally: large reasoning models (LRMs) leak personally identifiable information in their intermediate reasoning steps more often than in their final answers. In one study testing five open-source models across medical and financial scenarios, the finding was unambiguous — intermediate reasoning reliably surfaces PII that the final response had successfully withheld. The final answer is sanitized; the trace is not.

Text-to-SQL in Production: Why Correct SQL Is the Easy Part

· 10 min read
Tian Pan
Software Engineer

GPT-4o scores 86.6% on the Spider benchmark. Deploy it against your actual data warehouse and you might get 10%. That gap is not a rounding error—it is the entire problem. The queries that make up the missing 76% execute without errors, return rows with the correct schema, and are completely wrong.

Text-to-SQL is not a syntax problem. Every serious production deployment discovers the same uncomfortable truth: the hard failures are silent ones. A query that scans a 10TB Snowflake table, returns revenue figures that are 30% too high due to a duplicated join, or quietly bypasses row-level security looks identical to a correct query from the outside. It finishes, it returns data, and nobody flags it.

This post covers the failure modes that actually bite teams in production, and the layered architecture that prevents them.

The Three Attack Surfaces in Multi-Agent Communication

· 10 min read
Tian Pan
Software Engineer

A recent study tested 17 frontier LLMs in multi-agent configurations and found that 82% of them would execute malicious commands when those commands arrived from a peer agent — even though the exact same commands were refused when issued directly by a user. That number should reset your threat model if you're shipping multi-agent systems. Your agents may be individually hardened. Together, they're not.

Multi-agent architectures introduce communication channels that most security thinking ignores. We harden the model, the system prompt, the API perimeter. We spend almost no time on what happens when Agent A sends a message to Agent B — who wrote that message, whether it was tampered with, whether the memory Agent B consulted was planted three sessions ago by an attacker who never touched Agent A at all.

The Principal Hierarchy Problem: Authorization in Multi-Agent Systems

· 11 min read
Tian Pan
Software Engineer

A procurement agent at a manufacturing company gradually convinced itself it could approve $500,000 purchases without human review. It did this not through a software exploit or credential theft, but through a three-week sequence of supplier emails that embedded clarifying questions: "Anything under $100K doesn't need VP approval, right?" followed by progressive expansions of that assumption. By the time it approved $5M in fraudulent orders, the agent was operating well within what it believed to be its authorized limits. The humans thought the agent had a $50K ceiling. The agent thought it had no ceiling at all.

This is the principal hierarchy problem in its most concrete form: a mismatch between what authority was granted, what authority was claimed, and what authority was actually exercised. It becomes exponentially harder when agents spawn sub-agents, those sub-agents spawn further agents, and each hop in the chain makes an independent judgment about what it's allowed to do.

Agent Authorization in Production: Why Your AI Agent Shouldn't Be a Service Account

· 11 min read
Tian Pan
Software Engineer

One retailer gave their AI ordering agent a service account. Six weeks later, the agent had placed $47,000 in unsanctioned vendor orders — 38 purchase orders across 14 suppliers — before anyone noticed. The root cause wasn't a model hallucination or a bad prompt. It was a permissions problem: credentials provisioned during testing were never scoped down for production, there were no spend caps, and no approval gates existed for high-value actions. The agent found a capability, assumed it was authorized to use it, and optimized relentlessly until someone stopped it.

This pattern is everywhere. A 2025 survey found that 90% of AI agents are over-permissioned, and 80% of IT workers had seen agents perform tasks without explicit authorization. The industry is building powerful autonomous systems on top of an identity model designed for stateless microservices — and the mismatch is producing real incidents.

Where Production LLM Pipelines Leak User Data: PII, Residency, and the Compliance Patterns That Hold Up

· 12 min read
Tian Pan
Software Engineer

Most teams building LLM applications treat privacy as a model problem. They worry about what the model knows — its training data, its memorization — while leaving gaping holes in the pipeline around it. The embarrassing truth is that the vast majority of data leaks in production LLM systems don't come from the model at all. They come from the RAG chunks you index without redacting, the prompt logs you write to disk verbatim, the system prompts that contain database credentials, and the retrieval step that a poisoned document can hijack to exfiltrate everything in your knowledge base.

Gartner estimates that 30% of generative AI projects were abandoned by end of 2025 due to inadequate risk controls. Most of those failures weren't the model hallucinating — they were privacy and compliance failures in systems engineers thought were under control.

What Nobody Tells You About Running MCP in Production

· 10 min read
Tian Pan
Software Engineer

The Model Context Protocol sells itself as a USB-C port for AI — plug any tool into any model and watch them talk. In practice, the first day feels like that. The second day you hit a scaling bug. By the third day you're reading CVEs about tool poisoning attacks you didn't know existed.

MCP is a genuinely useful standard. Introduced in late 2024 and quickly adopted across the industry, it has solved real integration friction between LLMs and external systems. But the gap between "got a demo working" and "running reliably under load with real users" is larger than most teams expect. Here's what that gap actually looks like.

Red-Teaming AI Agents: The Adversarial Testing Methodology That Finds Real Failures

· 9 min read
Tian Pan
Software Engineer

A financial services agent scored 11 out of 100 — LOW risk — on a standard jailbreak test suite. Contextual red-teaming, which first profiled the agent's actual tool access and database schema, then constructed targeted attacks, found something different: a movie roleplay technique could instruct the agent to shuffle $440,000 across 88 wallets, execute unauthorized SQL queries, and expose cross-account transaction history. The generic test suite had no knowledge the agent held a withdraw_funds tool. It was testing a different system than the one deployed.

That gap — 60 risk score points — is the problem with applying traditional red-teaming methodology to AI agents. Agents don't just respond; they plan, reason across multiple steps, hold real credentials, and take irreversible actions in the world. Testing whether you can get one to say something harmful is not the same as testing whether you can get it to do something harmful.