Skip to main content

5 posts tagged with "threat-modeling"

View all tags

The AI Feature Your CTO Funded That Your Security Team Will Not Let You Ship

· 11 min read
Tian Pan
Software Engineer

The post-mortem says "we found security too late." The actual finding is that security found you on time. Your process found security too late.

This is the AI feature that cleared the budget gate in January because the CTO and the CFO agreed the company needed an AI moment. It cleared a light legal review in March because it was a prototype. Engineering built against the agreed spec through Q2. In late July, the launch-readiness security review opened, and on day one the threat model came back with blockers on the auth scopes, the data-exfiltration paths, the model provider's residency story, and the prompt-injection surface. The team's quarter is now spent rebuilding to address findings that should have shaped the original spec. Two quarters of slip, an executive memo about "process improvements," and a quiet decision next planning cycle to "deprioritize AI deep-integrations."

The launch did not fail because security was slow. It failed because security entered after the shape of the feature had already been frozen.

The PII Redactor Whose Own Training Corpus Was the Leak Vector

· 9 min read
Tian Pan
Software Engineer

A team stands up a fine-tuned redaction model in front of their log pipeline. It strips names, emails, account numbers, and IP addresses before anything lands in long-term storage. The model is small, fast, and easy to deploy alongside the ingestion workers. The privacy review approves it. Six months later a customer support engineer pastes a strange-looking log line into a debugging tool, and the redactor produces an output that contains a real customer's email address — one that does not appear anywhere in the input.

The pipeline did exactly what it was built to do. The redactor was the leak.

Security by Obscurity and the Agent Reading Your Wiki

· 12 min read
Tian Pan
Software Engineer

There is an endpoint inside your company that has been safe for ten years. It lives at a path that nobody outside the original team would ever guess. It is not in the public docs. It is not in the OpenAPI spec. It is not in the gateway's allowlist of "documented routes." Its auth layer is a token that any internal service can mint, because the threat model said the only way to reach it was to already know it existed. The endpoint accepts a JSON blob that, on a slow Tuesday, will reissue a refund or rotate an API key or move a row between two billing ledgers. It has worked correctly and unremarkably since 2016.

Last month, a teammate wired a coding agent into the engineering wiki to help with onboarding questions. The agent indexed every Confluence space, every archived design doc, every "do not delete — historical" page. Yesterday, a junior engineer asked it how refunds work. The agent stitched together a forgotten 2018 architecture diagram, a Slack export someone had pasted into a runbook, and a half-written postmortem. It produced, in conversational prose, a complete description of that endpoint, the token type required, and an example payload. The endpoint had not changed. Its threat model had.

Output As Payload: Your AI Threat Model Got Half The Boundary

· 9 min read
Tian Pan
Software Engineer

The threat model your team wrote for AI features almost certainly stops at the model. Inputs are untrusted: prompt injection, jailbreaks, adversarial uploads, poisoned retrieval. Outputs are content: things to moderate for safety, score on a refusal eval, ship to the user. The shape of that threat model is roughly "untrusted thing goes in, model thinks, safe thing comes out."

The new attack class flips that polarity. The model's output is rendered, parsed, executed, or relayed by a downstream system, and an attacker who can shape that output — through indirect prompt injection in retrieval, training-data influence, or socially engineered user queries — can deliver a payload to a target the model never had direct access to. The model becomes a confused deputy with reach the attacker doesn't have, and the boundary your team is defending is two systems too early.

EchoLeak is the canonical 2025 example. A single crafted email arrives in a Microsoft 365 mailbox. Copilot ingests it as part of routine context. The hidden instructions cause Copilot to embed sensitive context into a reference-style markdown link in its response, and the client interface auto-fetches the external image — exfiltrating chat logs, OneDrive content, and Teams messages without a single user click. Microsoft's input-side classifier was bypassed because the attack didn't need to break the model's refusal calibration. It needed to shape one specific token sequence in the output.

Tool-Composition Privilege Escalation: Your Security Review Cleared the Nodes, Not the Edges

· 10 min read
Tian Pan
Software Engineer

read_file is safe. send_email is safe. Your security review cleared each one against its own threat model: read-only access to a known directory, outbound mail through an authenticated relay with rate limits and recipient logging. Each passed. Both got registered. Then the agent composed them, and a single line of injected text in a customer support ticket turned the pair into an exfiltration tool that the original review had no language to describe.

The danger does not live in any node of the tool graph. It lives in the edges. Every per-tool security review you ran produced a verdict on a vertex; the actual permission surface of your agent is the set of paths through the catalog, and that set grows quadratically while your review process scales linearly. By the time your agent has fifteen registered tools, you have reviewed fifteen things and shipped roughly two hundred reachable two-step compositions, none of which any human auditioned.