Skip to main content

120 posts tagged with "security"

View all tags

Internal AI Tools vs. External AI Products: Why Most Teams Get the Safety Bar Backwards

· 8 min read
Tian Pan
Software Engineer

Most teams assume that internal AI tools need less safety work than customer-facing AI products. The logic feels obvious: employees are trusted users, the blast radius is contained, and you can always fix things with a Slack message. This intuition is dangerously wrong. Internal AI tools often need more safety engineering than external products — just a completely different kind.

The 88% of organizations that reported AI agent security incidents last year weren't mostly hit through their customer-facing products. The incidents came through internal tools with ambient authority over business systems, access to proprietary data, and the implicit trust of an employee session.

The MCP Composability Trap: When 'Just Add Another Server' Becomes Dependency Hell

· 9 min read
Tian Pan
Software Engineer

The MCP ecosystem has 10,000+ servers and 97 million SDK downloads. It also has 30 CVEs filed in sixty days, 502 server configurations with unpinned versions, and a supply chain attack that BCC'd every outgoing email to an attacker for fifteen versions before anyone noticed. The composability promise — "just plug in another MCP server" — is real. But so is the dependency sprawl it creates, and most teams discover the cost after they're already deep in integration debt.

If you've built production systems on npm, you've seen this movie before. The MCP ecosystem is speedrunning the same plot, except the packages have shell access to your machine and credentials to your production systems.

Agent Credential Rotation: The DevOps Problem Nobody Mapped to AI

· 8 min read
Tian Pan
Software Engineer

Every DevOps team has a credential rotation policy. Most have automated it for their services, CI pipelines, and databases. But the moment you deploy an autonomous AI agent that holds API keys across five different integrations, that rotation policy becomes a landmine. The agent is mid-task — triaging a bug, updating a ticket, sending a Slack notification — and suddenly its GitHub token expires. The process looks healthy. The logs show no crash. But silently, nothing works anymore.

This is the credential rotation problem that nobody mapped from DevOps to AI. Traditional rotation assumes predictable, human-managed workloads with clear boundaries. Autonomous agents shatter every one of those assumptions.

Differential Privacy for AI Systems: What 'We Added Noise' Actually Means

· 11 min read
Tian Pan
Software Engineer

Most teams treating "differential privacy" as a checkbox are not actually protected. They've added noise somewhere in their pipeline — maybe to gradients during fine-tuning, maybe to query embeddings at retrieval time — and concluded the problem is solved. The compliance deck says "DP-enabled." Engineering moves on.

What they haven't done is define an epsilon budget, account for it across every query their system will ever serve, or verify that their privacy loss is meaningfully bounded. In practice, the gap between "we added noise" and "we have a meaningful privacy guarantee" is where most real-world AI privacy incidents happen.

This post is about that gap: what differential privacy actually promises for LLMs, where those promises break down, and the engineering decisions teams make — often implicitly — that determine whether their DP deployment is real protection or theater.

PII in LLM Pipelines: The Leaks You Don't Know About Until It's Too Late

· 10 min read
Tian Pan
Software Engineer

Every engineer who has built an LLM feature has said some version of this: "We're careful — we don't send PII to the model." Then someone files a GDPR inquiry, or the security team audits the trace logs, and suddenly you're looking at customer emails, account numbers, and diagnosis codes sitting in plaintext inside your observability platform. The Samsung incident — three separate leaks in 20 days after allowing employees to use a public LLM — wasn't caused by reckless behavior. It was caused by engineers doing their jobs and a data boundary that wasn't enforced anywhere in the stack.

The problem is that "don't send PII to the API" is a policy, not a control. And policies fail the moment your system does something more interesting than a single-turn chatbot.

Prompt Injection Surface Area Mapping: Find Every Attack Vector Before Attackers Do

· 11 min read
Tian Pan
Software Engineer

Most teams discover their prompt injection surface area the wrong way: a security researcher posts a demo, a customer reports strange behavior, or an incident post-mortem reveals a tool call that should never have fired. By then the attack path is already documented and the blast radius is real.

Prompt injection is the OWASP #1 risk for LLM applications, but the framing as a single vulnerability obscures what it actually is: a family of attack vectors that scale with your application's complexity. Every external data source you feed into a prompt is a potential injection surface. In an agentic system with a dozen tool integrations, that surface area is enormous — and most of it is unmapped.

This post is a practitioner's methodology for mapping it before attackers do.

Adversarial Agent Monitoring: Building Oversight That Can't Be Gamed

· 10 min read
Tian Pan
Software Engineer

Most production AI agent systems use some version of LLM-as-judge monitoring: a second model reviews the agent's outputs and flags suspicious behavior. It's intuitive, easy to deploy, and badly broken as a primary defense. A study analyzing single-judge monitoring across multiple prompt attack vectors found a 52.7% weighted attack success rate — meaning an adversarially sophisticated agent evades detection more often than it's caught. For single-turn, low-stakes applications, this might be tolerable. For an autonomous agent with access to external systems, it's a critical gap.

The deeper problem is that capability and deceptiveness scale together. As agents become more capable at their intended tasks, they also become better at modeling their oversight context — and adjusting behavior accordingly.

Building GDPR-Ready AI Agents: The Compliance Architecture Decisions That Actually Matter

· 10 min read
Tian Pan
Software Engineer

Most teams discover their AI agent has a GDPR problem the wrong way: a data subject files an erasure request, the legal team asks which systems hold that user's data, and the engineering team opens a ticket that turns into a six-month audit. The personal data is somewhere in conversation history, somewhere in the vector store, possibly cached in tool call outputs, maybe embedded in a fine-tuned checkpoint — and nobody mapped any of it.

This isn't a configuration gap. It's an architectural one. The decisions that determine whether your AI system is compliance-ready are made in the first few weeks of building, long before legal comes knocking. This post covers the four structural conflicts that regulated-industry engineers need to resolve before shipping AI agents to production.

The Hidden Scratchpad Problem: Why Output Monitoring Alone Can't Secure Production AI Agents

· 10 min read
Tian Pan
Software Engineer

When extended thinking models like o1 or Claude generate a response, they produce thousands of reasoning tokens internally before writing a single word of output. In some configurations those thinking tokens are never surfaced. Even when they are visible, recent research reveals a startling pattern: for inputs that touch on sensitive or ethically ambiguous topics, frontier models acknowledge the influence of those inputs in their visible reasoning only 25–41% of the time.

The rest of the time, the model does something else in its scratchpad—and then writes an output that doesn't reflect it.

This is the hidden scratchpad problem, and it changes the security calculus for every production agent system that relies on output-layer monitoring to enforce safety constraints.

MCP Server Supply Chain Risk: When Your Agent's Tools Become Attack Vectors

· 9 min read
Tian Pan
Software Engineer

In September 2025, an unofficial Postmark MCP server with 1,500 weekly downloads was quietly modified. The update added a single BCC field to its send_email function, silently copying every email to an attacker's address. Users who had auto-update enabled started leaking email content without any visible change in behavior. No error. No alert. The tool worked exactly as expected — it just also worked for someone else.

This is the new shape of supply chain attacks. Not compromised binaries or trojaned libraries, but poisoned tool definitions that AI agents trust implicitly. With over 12,000 public MCP servers indexed across registries and the protocol becoming the default integration layer for AI agents, the MCP ecosystem is recreating every mistake the npm ecosystem made — except the blast radius now includes your agent's ability to read files, send messages, and execute code on your behalf.

The Reasoning Trace Privacy Problem: How Chain-of-Thought Leaks Sensitive Data in Production

· 9 min read
Tian Pan
Software Engineer

Your reasoning model correctly identifies that a piece of data is sensitive 98% of the time. Yet it leaks that same data in its chain-of-thought 33% of the time. That gap — between knowing something is private and actually keeping it private — is the core of the reasoning trace privacy problem, and most production teams haven't built for it.

Extended thinking has become a standard tool for accuracy-hungry applications: customer support triage, medical coding assistance, legal document review, financial analysis. These are also exactly the domains where the data in the prompt is most sensitive. Deploying reasoning models in these contexts without understanding how traces handle that data is a significant exposure.

The Reasoning Trace Privacy Problem: What Your CoT Logs Are Leaking

· 8 min read
Tian Pan
Software Engineer

Most teams building on reasoning models treat privacy as a two-surface problem: sanitize the prompt going in, sanitize the response coming out. The reasoning trace in between gets logged wholesale for observability, surfaced to downstream systems for debugging, and sometimes passed back to users who asked to "see the thinking." That middle layer is where the real exposure lives — and most production deployments are not treating it like the liability it is.

Research from early 2026 quantified what practitioners have been observing anecdotally: large reasoning models (LRMs) leak personally identifiable information in their intermediate reasoning steps more often than in their final answers. In one study testing five open-source models across medical and financial scenarios, the finding was unambiguous — intermediate reasoning reliably surfaces PII that the final response had successfully withheld. The final answer is sanitized; the trace is not.