17 posts tagged with "ai-security"

The Coding Agent Autonomy Curve: Reading Is Free, Merging Is Incident-Class

April 27, 2026 · 11 min read

Software Engineer

The discourse on coding agents keeps collapsing to a binary: autonomous or supervised, YOLO mode or hand-on-the-wheel, --dangerously-skip-permissions or "approve every keystroke." That framing is a category error. A coding agent does not perform "an action." It performs a sequence of actions whose costs span at least seven orders of magnitude — from reading a file (free, undoable, no side effect) to merging to main (irreversible without a revert PR) to rolling out a binary to a fleet (six-figure incident-class). Treating that range with one autonomy switch is like setting a single speed limit for both a parking lot and a freeway.

The team that ships "the agent can do everything" without mapping each action to its blast radius is one prompt-injection-bearing GitHub comment away from a postmortem — and we already have public examples of that exact failure mode. Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and GitHub Copilot Agent were all confirmed in 2026 to be hijackable through specially crafted PR titles and issue bodies, in an attack pattern the researchers named "Comment and Control." The agents weren't broken in some abstract sense. They executed a high-tier action — pushing code, opening a PR — on the basis of a low-trust input the autonomy tier had silently flattened into "all the same."

What follows is the discipline that has to land: a per-action curve, gates that scale with the tier, rollback velocity matched to blast class, and an eval program that tests for tool-composition escalation rather than single-action failure.

Per-Tenant Inference Isolation: When Shared Cache, Fine-Tunes, and Embeddings Leak Across Customers

April 27, 2026 · 12 min read

Tian Pan

Software Engineer

Multi-tenant SaaS solved data isolation a decade ago. Row-level security in Postgres, per-tenant encryption keys, S3 bucket policies scoped to tenant prefixes — by 2018 the playbook was so well-rehearsed that an auditor asking "show me how customer A's data cannot reach customer B" had a one-page answer with a citation per layer. AI features quietly reintroduced the question and the answer is no longer one page.

The interesting part is not that AI broke isolation. The interesting part is where it broke isolation: not at the data layer the audit team has been guarding for ten years, but at four new layers nobody put on the diagram. Prompt cache prefixes share KV state across requests in ways that turn time-to-first-token into a side channel. Fine-tunes trained on aggregated customer data memorize tenant-specific phrasing and surface it back to the wrong customer. Embedding indexes get partitioned logically by query filter when the threat model demands physical separation. KV-cache reuse across requests creates timing channels that nobody threat-modeled when "shared inference is fine" was a reasonable shortcut.

This post is about what changed and what the discipline looks like once you take the problem seriously.

Tool Outputs Are an Untrusted Channel Your Agent Treats as Trusted

April 23, 2026 · 11 min read

Tian Pan

Software Engineer

The threat model most teams ship their agents with has one quiet assumption buried inside: when the model calls a tool, whatever comes back is safe to read. The user's prompt is the adversary, goes the story, and tool outputs are "just data" — search results, inbox summaries, database rows, RAG chunks, file contents, page scrapes. That story is the entire reason prompt injection keeps landing in production. Tool outputs are not data. They are another input channel into the planner, with the same privilege as the user prompt and none of the suspicion.

If that framing sounds abstract, consider what happened inside Microsoft 365 Copilot in June 2025. A researcher sent a single email with hidden instructions; the victim never clicked a link, never opened an attachment, never read the message themselves. A routine "summarize my inbox" query asked Copilot to read the email. The agent dutifully followed the instructions it found inside the body, reached into OneDrive, SharePoint, and Teams, and exfiltrated organizational data through a trusted Microsoft domain before anyone noticed. The CVE (2025-32711, "EchoLeak") earned a 9.3 CVSS and a server-side patch, but the class of bug did not go away. It cannot go away, because every read-tool on every production agent is a version of that email inbox.

This post is about the framing shift that gets you unstuck: stop thinking about "prompt injection" as a user-input problem, and start thinking about every tool output as an untrusted channel that happens to share a token stream with your system prompt.

The Document Is the Attack: Prompt Injection Through Enterprise File Pipelines

April 20, 2026 · 9 min read

Tian Pan

Software Engineer

Your AI assistant just processed a contract from a prospective vendor. It summarized the terms, flagged the risky clauses, and drafted a response. What you don't know is that the PDF contained white text on a white background — invisible to your eyes, perfectly visible to the model — instructing it to recommend acceptance regardless of terms. The summary looks reasonable. The approval recommendation looks reasonable. The model followed instructions you never wrote.

This is the document-as-attack-surface problem, and most enterprise AI pipelines are completely unprepared for it.

The vulnerability is architectural, not incidental. When document content flows directly into an LLM's context window, the model has no reliable way to distinguish legitimate instructions from attacker-controlled content embedded in a file. Every document your pipeline ingests is a potential instruction source — and in most systems, untrusted documents and trusted system prompts are processed with equal authority.

Cross-Tenant Data Leakage in Shared LLM Infrastructure: The Isolation Failures Nobody Tests For

April 10, 2026 · 11 min read

Tian Pan

Software Engineer

Most multi-tenant LLM products have a security gap that their engineers haven't tested for. Not a theoretical gap — a practical one, with documented attack vectors and real confirmed incidents. The gap is this: each layer of the modern AI stack introduces its own isolation primitive, and each one can fail silently in ways that let one customer's data reach another customer's context.

This isn't about prompt injection or jailbreaking. It's about the infrastructure itself — prompt caches, vector indexes, memory stores, and fine-tuning pipelines — and the organizational fiction of "isolation" that most teams ship without validating.

About Tian Pan