Skip to main content

578 posts tagged with "insider"

View all tags

Your System Prompt Will Leak: Designing for Prompt Extraction

· 10 min read
Tian Pan
Software Engineer

The threat model for LLM features over-indexes on three failure modes: prompt injection, user-data exfiltration, and unauthorized tool calls. There is a quieter attack that lands more often, costs less to mount, and shows up in fewer postmortems because nobody filed one — prompt extraction. An adversarial user, sometimes a competitor, sometimes a curious researcher, walks the model into reciting its own system prompt over a handful of turns. The carefully tuned instructions that encode your team's product behavior, refusal policy, retrieval scaffolding, and brand voice land in a public GitHub repository within the week.

The repositories already exist. A widely-circulated GitHub project tracks extracted system prompts from Claude, ChatGPT, Gemini, Grok, Perplexity, Cursor, and v0.dev — updated as new model versions ship, often within hours of release. Anthropic's full Claude prompt clocks in at over 24,000 tokens including tools, and you can read it. The companies most invested in prompt secrecy are the ones whose prompts leak most reliably, because they are also the ones whose attackers are most motivated.

When Your CLI Speaks English: Least Authority for Promptable Infrastructure

· 13 min read
Tian Pan
Software Engineer

A platform team I talked to this quarter shipped a Slack bot that wrapped kubectl and accepted English. An engineer typed "clean up the unused branches in staging." The bot helpfully deleted twelve namespaces — including one whose name matched the substring "branch" but which happened to host a long-lived integration environment that the mobile team had been using for a week. No exception was thrown. Every individual call the bot made was a permission the bot legitimately held. The post-mortem could not point to a broken access rule, because no rule was broken. The bot did exactly what its IAM policy said it could do.

The Unix philosophy was a containment strategy hiding inside an aesthetic preference. Small tools with narrow surfaces meant that the blast radius of any single command was bounded by the verbs and flags it accepted. rm -rf was dangerous because everyone agreed it was; kubectl delete namespace required the operator to type out the namespace, and the typing was the gate. The principle of least authority was easy to enforce because authority was lexical: the shape of the command told you the shape of the action.

Then the wrappers started accepting English. Now "the shape of the command" is whatever the LLM decided it meant.

Reachability Analysis for Agent Action Spaces: Eval Coverage for the Branches You Never Tested

· 12 min read
Tian Pan
Software Engineer

The first time anyone on your team learned that the agent could call revoke_api_key was the morning a well-meaning user typed "this token feels old, can you rotate it for me?" The tool had been registered six months earlier as part of a batch import from the auth team's MCP server. It had passed schema validation, appeared in the catalog enumeration, and then sat. No eval ever invoked it. No production trace ever touched it. Then one prompt, one planner decision, and the incident channel learned the tool existed.

This is the failure mode that hides inside every agent with a non-trivial tool catalog. Forty registered functions and a planner that can compose them produce a reachable graph of plans whose long tail you have never observed. The assumption that "we tested the common paths" papers over the fact that the dangerous branch is, almost by definition, the one you never saw.

Sampling Parameter Inheritance: When Temperature 0.7 Leaks From the Planner Into the Verifier

· 10 min read
Tian Pan
Software Engineer

A verifier that flips its own answer eight percent of the time is not a flaky model. It is a sampling configuration bug that reached production because the framework defaulted to inheritance. The planner needed temperature=0.7 to brainstorm subtask decompositions. The verifier — the role whose entire job is to give a low-variance yes-or-no on whether the answer satisfies the rubric — was instantiated through the same harness call, and silently picked up the same temperature. Nobody set it that way on purpose. Nobody set it at all.

This is the most expensive parameter in your stack that nobody owns. It compounds across the call tree: the summarizer above the verifier, the structured-output extractor below it, and the retry loop wrapping the whole thing all consume the planner's "be creative" knob as if it were a global. The bill arrives in three places at once — eval flakiness, token spend, and the half-day a senior engineer spends bisecting a regression that turns out to be no regression at all.

Session Stitching: Why Your Conversation-ID Is a Lie

· 11 min read
Tian Pan
Software Engineer

A user starts negotiating a contract with your agent on her desktop at 9 a.m. She gets a Slack ping, switches to her phone over lunch to ask one clarifying question, and reopens the desktop tab at 4 p.m. to revise the draft. To her, that was one task — three hours of working through one contract. To your system, that was three sessions on two devices, each with its own conversation-id, each with its own memory window, each presenting a fresh greeting and asking her to re-paste the draft she'd already discussed twice.

The bug is not in the model. The bug is that your platform encoded "session" — a transport-layer artifact about a single connection — as the unit of context, while your user encoded "task" — the contract — as the unit of context. Every framework on the market quietly conflates the two, and the gap between them is where half of agent UX disappears.

Shadow MCP: The Tool Servers Your Security Team Has Never Heard Of Are Already Running on Your Engineers' Laptops

· 13 min read
Tian Pan
Software Engineer

Your security team has a complete inventory of every SaaS subscription on the corporate card, every OAuth app with admin consent, every device on the corporate Wi-Fi. They have zero visibility into the seven processes bound to 127.0.0.1 on your senior engineer's laptop right now — a "deploy assistant" with a long-lived staging API token, a "ticket triager" subscribed to a customer-data Slack channel, a "release notes generator" with read access to the production analytics warehouse. None of it is on a vendor list. None of it shows up in the SSO logs. All of it is running on credentials the engineer already had, doing things nobody approved them to do.

This is shadow MCP, and it is the fastest-growing unmanaged authorization surface in the enterprise. The Model Context Protocol made it trivially cheap to wire any tool into any LLM, and engineers — being engineers — wired the obvious things first. Saviynt's CISO AI Risk Report puts the number at 75% of CISOs who have already discovered unsanctioned AI tools running in their production environments. The GitHub MCP server crossed two million weekly installs in early 2026. The Postgres MCP server, which gives an LLM a SQL prompt against any database the developer can reach, is north of 800,000 weekly installs. None of those numbers represent enterprise IT decisions.

The Shared-Prompt Flag Day: When One Edit Becomes Thirty Teams' Regression

· 10 min read
Tian Pan
Software Engineer

The first edit to a shared system prompt feels like good engineering. Three teams all paste the same eighteen-line safety preamble at the top of their agents, someone notices, and an internal platform team says the obvious thing: let's centralize it. A prompts.common.safety_preamble@v1 lands in a registry. Thirty teams adopt it within a quarter because it's the path of least resistance — and because security is happy that one team owns the wording. For two quarters, this looks like a clean DRY win.

Then the security team needs a small wording change. Maybe a new compliance regulation tightens what an assistant is allowed to volunteer about a user's account. Maybe a red-team finding requires a one-sentence addition to the refusal clause. The platform team makes the edit, ships v2, and within a day the support queue fills with messages from consumer teams: our eval dropped, our format broke, our tool-call rate halved, our tone changed, our latency went up because the model started reasoning more. Each team wants the edit reverted. The security team needs it shipped. Nobody can roll forward without a re-eval, and nobody owns the re-eval. Welcome to the shared-prompt flag day.

Streaming JSON Parsers: The Gap Between Tokens and Typed Objects

· 12 min read
Tian Pan
Software Engineer

The model is emitting JSON token by token. Your UI wants to render fields the moment they materialize — a confidence score before the long answer body, the arguments of a tool call as the model fills them in. Then someone wires up JSON.parse on every chunk and the whole thing falls over, because JSON.parse is all-or-nothing. It needs a balanced document to return anything. Until the model emits the closing brace, you have nothing to show.

This is not a parser problem you can fix with a try/catch. The standard JSON parser was designed against a content-length-known HTTP response. Partial input is not a state it models — it is "input error." When you treat a token stream as if it were an HTTP body, you inherit thirty years of "the document is either complete or invalid," and your UI pays the bill.

Structured Concurrency for Parallel Tool Fanout: Who Owns Partial Failure?

· 11 min read
Tian Pan
Software Engineer

The moment your agent fans out five parallel tool calls — search across three indexes, query two databases, hit one external API — you have crossed an invisible line. You are no longer writing prompt-and-response code. You are writing a concurrent program. Most agent frameworks pretend you are not, and the bill arrives at 2 AM.

The pretense is comfortable. The planner emits a list of tool calls, the runtime fires them off, the runtime collects whatever comes back, the planner consumes the aggregate. From a thousand feet up it looks like a fan-out / fan-in pipeline, and most teams treat it that way until production teaches them otherwise. The problem is that twenty years of concurrent-programming research — partial-failure semantics, structured cancellation, backpressure, deterministic error attribution — already solved the failure modes you are about to rediscover. Your agent framework, by default, did not import any of it.

Token Amplification: The Prompt-Injection Attack That Burns Your Bill

· 10 min read
Tian Pan
Software Engineer

A user submits a $0.01 request. Your agent reads a webpage. Forty seconds later, the inference bill for that single turn is $42. The query was technically successful — the agent returned a reasonable answer. It just took three nested sub-agents, a 200K-token document fetch, and a recursive plan refinement loop to get there. None of that fanout was the user's idea. It was a sentence buried in the page the agent read.

This is token amplification: a prompt-injection class that does not exfiltrate data, does not call unauthorized tools, and does not leave a clean security signature. It just sets your bill on fire. The cloud bill is the payload, and the user's request is the carrier.

Tokenizer Churn: The Silent Breaking Change Inside Your 'Compatible' Model Upgrade

· 11 min read
Tian Pan
Software Engineer

The vendor said the upgrade was a drop-in replacement. The API contract held. The model name in your config barely changed. A week later, your context-window guard starts triggering on prompts it never tripped on before, your stop-sequence regex matches in the wrong place, and one of your few-shot examples started producing a confidently wrong answer that your eval suite happens not to cover. Nobody touched the prompt. Nobody touched the temperature. Somebody quietly retrained the tokenizer.

Tokenizer changes are the breaking change vendors don't call breaking. The API surface is byte-stable, the SDK didn't bump a major version, and the release notes mention "improved instruction following" — but the function from your input string to the integer sequence the model actually sees has been replaced. Every assumption your code made about how text becomes tokens is now subtly wrong. The cost of that invisibility is two weeks of "the model feels different" before someone re-runs a canonical prompt through count_tokens and finds the answer.

Your Tool Catalog Is a Power Law and You're Optimizing the Long Tail

· 11 min read
Tian Pan
Software Engineer

Pull a week of tool-call traces from any production agent and the shape is the same: three or four tools handle 90% of the calls, and a couple of dozen others split the remaining 10%. The catalog is a power law, but the framework treats it like a uniform list. Every tool description ships in every system prompt, every selection rubric weights tools equally, every eval samples the catalog as if a search-files call and a refund-issue call were drawn from the same distribution. They are not.

The cost of that flatness is invisible until it isn't. A team adds the eighteenth tool, the planner's accuracy on the original three drops two points, nobody can localize the regression to a specific change because everything moved at once, and the eval suite — itself uniform across the catalog — averages the slip into a number that still looks fine. Meanwhile the tokens spent describing tools the model will not call this turn now exceed the tokens spent on the user's actual prompt.