Skip to main content

120 posts tagged with "security"

View all tags

Your Embeddings Don't Know the Contractor Was Off-Boarded

· 9 min read
Tian Pan
Software Engineer

A contractor finished a six-month engagement last quarter. HR ran the off-boarding checklist: SSO disabled, laptop wiped, GitHub seat removed, Slack archived, Notion access revoked. Compliance signed off. Six weeks later, an internal RAG assistant answered a question by quoting a confidential strategy document the contractor had authored — and the chunk it cited was still tagged with the contractor's user ID in the vector store's allow-list. Nothing in the access logs of the source-of-truth ever recorded a read, because there was no read. The retrieval came from a copy of the data that nobody wired into the off-boarding flow.

This is the structural problem nobody puts on the architecture diagram. Your vector index is not just a similarity-search engine. It is a permission cache — a derived store of who-can-see-what, frozen at the moment you ran your embedding job — and almost nobody is invalidating it the way they invalidate everything else.

The Tool-Call Authorization Layer Nobody Wrote

· 9 min read
Tian Pan
Software Engineer

Your API gateway authenticated the user. Your tool endpoint will check that the user has permission to delete the row. Between those two checks sits a layer that does not exist: the one that decides whether the model was allowed to ask for delete_user at all, with those exact arguments, in this conversation.

In most agent stacks, that layer is the system prompt. It says something like "be careful with destructive actions" and "only delete records the user explicitly asked you to delete." That sentence is not access control. It is a polite request to a non-deterministic process, evaluated by the same component that the attacker is trying to manipulate.

The Tool You Added For One Agent Is Now In Every Agent's Hand

· 10 min read
Tian Pan
Software Engineer

Six months ago, somebody on the customer-support team wired a send_email tool for their agent. It worked. The platform team noticed it in the shared tool registry, gave a thumbs-up emoji on the PR, and moved on. This week, a security engineer ran an audit and discovered that send_email is in the action surface of the meeting-notes summarizer, the data-quality bot, an analytics assistant nobody officially owns, and a half-built prototype that hasn't been touched since January. None of these agents need to send email. None of them have ever been reviewed for whether they should be allowed to. The PRD for the meeting-notes summarizer is two sentences long and the words "outbound communication" do not appear in it.

This is the default state of every shared tool registry I have ever audited. The act of registering a tool — pushing a JSON schema and a handler into a central catalog — is treated as a developer convenience, like adding a utility function to a shared library. But once the registry is sourced into every agent's prompt, registering a tool is not a library change. It is a deployment to every agent in the company simultaneously, with no review of whether each of them should have received it.

The Permission Prompt Is a UX Bug: When Human-in-the-Loop Becomes Human-as-Rubber-Stamp

· 9 min read
Tian Pan
Software Engineer

Watch a developer use an agentic coding tool for an hour and you will see the same gesture forty times: a dialog appears, "Allow the agent to run git status?", and a hand moves to the approve button before the eyes finish reading. By the fortieth prompt the prompt is not being read at all. It is a speed bump the user has learned to take at full speed.

This is the quiet failure of human-in-the-loop. The architecture diagram still shows a human gating every dangerous action. The audit log still records an explicit approval for every command. But the human has stopped evaluating anything. They have become a biological "yes" function wired into the control flow — present in the loop, contributing no judgment to it. The permission prompt was supposed to be a safety control. It has degraded into latency with a confirmation dialog attached.

Prompt Injection Is a Confused Deputy, Not a Content-Filtering Problem

· 10 min read
Tian Pan
Software Engineer

The most common post-incident finding for a prompt injection breach is some variation of "the model got tricked." A retrieved document contained hidden instructions, the agent followed them, customer data left the building. The fix that follows is almost always a content filter: scan the input, classify the malicious instruction, strip it out before it reaches the model. Ship the filter, close the ticket.

That finding is wrong, and the filter is a treadmill. "The model got tricked" describes the symptom, not the vulnerability. The vulnerability is that an agent holding real privileges — a database token, a send-email capability, filesystem write — accepted instructions from a source that should never have been allowed to command those privileges. That is not a new class of bug. It is a confused deputy, and operating systems named and largely solved it almost forty years ago.

If you treat prompt injection as a detection problem, you are signing up for an arms race against every attacker who can phrase a sentence. If you treat it as an authority problem, you get to reuse decades of security engineering that already works.

Shadow AI: The Agents Your Team Already Shipped

· 10 min read
Tian Pan
Software Engineer

Shadow IT used to mean a marketing team expensing a SaaS subscription, or an engineer spinning up an unsanctioned S3 bucket. It was annoying, it was a procurement headache, and it was mostly survivable. Shadow AI is the same instinct — route around the slow official path — except the blast radius is larger and the entry cost has collapsed to almost nothing.

An engineer can wire an LLM API call into a production workflow in an afternoon. A support lead can stand up a no-code triage agent before lunch. A data analyst can paste a quarter's worth of customer records into a chat window to "just summarize this real quick." None of it passes through review, none of it shows up in an architecture diagram, and your governance program cannot protect a system it does not know exists.

The uncomfortable part is the scale. A 2025 UpGuard survey found that more than 80% of workers — and nearly 90% of security professionals — use unapproved AI tools at work. Your security team is doing it. Your executives are doing it. The question is not whether you have shadow AI. It is whether you can see any of it.

Your Tool Descriptions Are an Instruction Channel the Model Obeys

· 8 min read
Tian Pan
Software Engineer

When a security team reviews a new tool integration, they read the code. They check what the function does, what it touches, what scopes it needs, whether it logs secrets. They almost never read the one sentence that decides whether the model calls it at all — the tool's description. That sentence is not documentation. It is an instruction the model treats as authoritative, and in most agent stacks nobody reviews it.

A tool description is written for the model to read. The model uses it to decide when the tool is relevant, what arguments to pass, and how to interpret what comes back. That makes the description a control channel into the model's behavior. And the moment a tool arrives from a third-party registry, a Model Context Protocol (MCP) server you don't operate, or a plugin a teammate installed last week, that control channel is authored by someone you never agreed to trust.

This is the gap. Input sanitization inspects what users type. Code review inspects what functions execute. The tool description sits between them — it is configuration that behaves like input — and it falls through both nets.

Bring-Your-Own-Key for AI Features: The Sales-Driven Re-Architecture Nobody Costed

· 10 min read
Tian Pan
Software Engineer

The procurement team you're selling to will eventually ask the one question that resets your architecture: "Can we bring our own model API key?" Saying yes wins the deal. Saying yes also moves your trust boundary, your cost boundary, and your operational boundary at the same time — and most product teams discover this only after the contract is signed and the first month of usage produces a support ticket nobody knows how to answer.

BYOK is sold internally as a toggle. The customer pastes a key, your code reads it from the vault instead of from your own account, and inference flows the same way it always did. It is not a toggle. It is a sales-driven re-architecture that ripples through cost attribution, security incident response, observability, rate limiting, model-version pinning, and on-call accountability. The teams that ship it without acknowledging this end up rebuilding their entire platform layer a year later while a paying enterprise customer waits for fixes.

Tenancy Leaks Through Few-Shot Examples: When Your Prompt Library Becomes a Cross-Customer Data Store

· 11 min read
Tian Pan
Software Engineer

Open the production system prompt of a maturing AI product, scroll past the role description, and you will almost always find a section labeled # Examples or ## Few-shot demonstrations. The examples are excellent — they are concrete, they are domain-specific, they pattern-match exactly the failure modes the eval set was struggling with last quarter. They are also, on closer inspection, real customer data. A real ticket ID from a real account. A phrasing pattern lifted verbatim from a support thread. An internal product code that one tenant uses and the rest of the customer base has never heard of.

The team that put them there is not careless. The examples got into the prompt the way good examples always get into prompts: someone mined production traces for cases the model handled poorly, picked the cleanest worked example, pasted it into the system message, watched the eval scores climb, and shipped. That pipeline — production trace to system prompt — is the most reliable prompt-improvement loop in modern LLM engineering. It is also a structural cross-tenant data leak that the team built without noticing, and the system prompt has quietly become a multi-tenant data store the data-processing agreement never priced.

Your Fine-Tuning Corpus Is a Codebase. Stop Shipping It Through a Bucket.

· 11 min read
Tian Pan
Software Engineer

By month nine of any serious fine-tuning project, your training corpus has more authors than your codebase. Synthetic generation pipelines wrote a few million examples. The vendor labeling firm contributed 80K rows from a workforce you have never met. An engineer added 47 examples last Tuesday to fix a regression they spotted in eval. A scraping job pulls production traces into a "supplementary" parquet file every night. A CSV someone dropped into S3 in February is still there, still in the training mix, and the person who wrote it left the company in March.

Now look at your application code repo. Every line is attributable to a named author. Every change went through a PR with at least one reviewer. Commits are signed. The main branch is protected. Merges require a second human. There is an audit log. If an auditor asks who wrote line 47 of payment_processor.py, you have an answer within seconds.

If they ask who wrote example 47 of the corpus that produced model v2.3, the honest answer is "a Mechanical Turk batch from 2024-Q2, vendor unknown, justification absent." Your fine-tuning corpus is a higher-privilege deployment surface than your codebase — it directly shapes model behavior in production — and you are shipping it through a bucket while you ship code through a reviewed PR. The threat model is inverted.

The Agent Scratch Directory: The Unowned Filesystem PII Surface Nobody Inventoried

· 10 min read
Tian Pan
Software Engineer

A regulator walks into your office and asks the question security teams rehearse for: "Show me every place customer data lives." Your data team produces the inventory. The primary database is on it. The analytics warehouse is on it. The object store, the queue, the search index, the backup destination — all on it, with classification labels, retention policies, encryption details, and named owners. Then someone in the room mentions the agent worker pool, and the inventory has nothing to say. The pool has been running for nine months. Each worker has a local disk. The agents on those workers have been parsing PDFs, transcribing audio, downloading email attachments, and caching intermediate JSON between tool calls the entire time. Nobody put any of that on the asset register.

This is the scratch directory problem. Every long-running agent worker accumulates an ephemeral filesystem that grows organically as new tools are added — extracted text from a PDF parser, transcribed audio from a Whisper step, downloaded attachments from a Gmail tool, screenshots from a browser-use step, vector-search snippets cached for the next turn, intermediate JSON the agent emitted between two tool calls so the second one wouldn't have to re-derive it. Unlike databases and queues and buckets, this surface has no retention policy, no encryption-at-rest standard, no DLP scanner pass, and no entry on the data-classification spreadsheet. The platform team thinks "agent state" means the inference-provider context window. The SRE team thinks "agent state" means the durable database. The worker's /tmp/agent-workspace-${session_id}/ directory is a third copy of customer data that nobody owns.

Browser Agent Session Bleed: When One Profile Serves Many Tenants

· 10 min read
Tian Pan
Software Engineer

A computer-use agent finishes a task on a customer's CRM, the worker pool returns the browser to its idle ring, the next request lands a few hundred milliseconds later, and the navigation to the dashboard succeeds — except it succeeds as the wrong user. The OAuth cookie from the previous session was still on the profile. The trace shows navigation succeeded, screenshot captured, action performed. Nothing in the run log says the agent was acting as someone who never asked it to.

This is the failure class that browser agents inherit silently from the libraries they're built on. Headless browser frameworks were designed for one user per profile because that's how a browser has worked for thirty years. When a worker pool reuses profiles to amortize the eight-second cold start of a fresh Chromium instance, that one-user assumption breaks, and the breakage is invisible to every layer of telemetry the team usually trusts.