Skip to main content

578 posts tagged with "insider"

View all tags

Tool-Composition Privilege Escalation: Your Security Review Cleared the Nodes, Not the Edges

· 10 min read
Tian Pan
Software Engineer

read_file is safe. send_email is safe. Your security review cleared each one against its own threat model: read-only access to a known directory, outbound mail through an authenticated relay with rate limits and recipient logging. Each passed. Both got registered. Then the agent composed them, and a single line of injected text in a customer support ticket turned the pair into an exfiltration tool that the original review had no language to describe.

The danger does not live in any node of the tool graph. It lives in the edges. Every per-tool security review you ran produced a verdict on a vertex; the actual permission surface of your agent is the set of paths through the catalog, and that set grows quadratically while your review process scales linearly. By the time your agent has fifteen registered tools, you have reviewed fifteen things and shipped roughly two hundred reachable two-step compositions, none of which any human auditioned.

Abstain or Escalate: The Two-Threshold Problem in Confidence-Gated AI

· 13 min read
Tian Pan
Software Engineer

Most production AI features ship with a single confidence threshold. Above the line, the model answers. Below it, the user gets a flat "I'm not sure." That single number is doing two completely different jobs at once, and it's why your trust metric has been sliding for two quarters even though your accuracy on answered queries looks fine.

The right design has at least two cutoffs. An abstain threshold sits low: below it, the model declines because no answer is worth more than silence. An escalate threshold sits in the middle: between the two cutoffs, the system hands the case to a human reviewer instead of dropping it on the floor. Collapse them into a single dial and you ship a product that feels equally useless when it's wrong and when it's uncertain — which is the worst possible position to occupy in a market where users have a free alternative one tab away.

This isn't a new idea. The reject-option classifier literature has been arguing for split thresholds since the 1970s, distinguishing ambiguity rejects (the input is between known classes) from distance rejects (the input is far from any training data). Production AI teams keep rediscovering the same lesson the hard way, usually about six months after their first launch, when the support queue is full of people typing "is this thing broken or what."

Your Provider's 99.9% SLA Is Measured at the Wrong Boundary for Your Agent

· 11 min read
Tian Pan
Software Engineer

A model provider publishes a 99.9% availability SLA. The procurement team frames it as "three nines, four hours of downtime per year, acceptable for a non-tier-zero workload." Six months later the agent feature ships and the on-call dashboard shows a user-perceived task-success rate around 98% — a number nobody wrote into a contract, nobody can find on the provider's status page, and nobody owns. The provider is meeting their SLA. The product is missing its SLO. Both are true at the same time, and the gap is not a bug — it is arithmetic.

The arithmetic is the part most teams skip. A provider's 99.9% is measured against a synchronous-request workload — one user, one prompt, one response, one billing event. An agent does not generate that workload. A single user-perceived task fans out into 8 to 20 inference calls, retries on transient errors, hedges on slow ones, and aggregates partial outputs. Each of those calls is an independent draw against the provider's failure distribution, and the task fails if any essential call fails. The boundary the SLA covers and the boundary the user feels are not the same boundary.

Your Agent's Outbox Is Your Next Deliverability Incident

· 11 min read
Tian Pan
Software Engineer

The first time it happens, the on-call engineer is staring at a Gmail Postmaster dashboard that has gone solid red, the support inbox is on fire because customer password resets are landing in spam, and the agent that did this is still running. It sent eighty thousand "personalized follow-ups" between 4 a.m. and 9 a.m. local time, all from the company's primary sending domain, all signed with the same DKIM key the billing system uses. By the time anyone notices, the domain reputation that took three years to build is gone, and so are the next six weeks of inbox placement on every transactional message the company depends on.

Sending email from an agent looks like a one-line tool call. send_email(to, subject, body) is the canonical demo, and every framework ships it as a starter integration. But email is not like other tools. A bad database query rolls back. A bad API call returns an error. A bad batch of email lowers the deliverability of every other email your company sends, for weeks, and there is no transaction to roll back because the messages are already in flight to recipient mailservers that are now writing your domain's reputation history.

Why AI-Generated Comments Rot Faster Than the Code They Describe

· 11 min read
Tian Pan
Software Engineer

When an agent writes a function and a comment in the same diff, the comment is not documentation. It is a paraphrase of the code at write-time, generated by the same model from the same context, and it is silently wrong the first time the code shifts. The function gets refactored, an argument changes type, an early-return gets added, the comment stays. By next quarter, the comment is encoding a specification that no longer matches the code, and the next reader trusts the comment because the comment is easier.

This is an old failure mode — humans-edit-code-comments-stay-stale — but agents accelerate it across three dimensions at once. Comment volume goes up because agents add a doc block to every function whether it needs one or not. The comments are grammatically perfect, so reviewers don't flag them as low-quality. And the comments paraphrase the code in different terms than the code actually executes, so they look like documentation but encode a second specification that drifts independently of the first.

AI Reviewing AI: The Asymmetric Architecture of Code-Review Agents

· 12 min read
Tian Pan
Software Engineer

A review pipeline where the author and the reviewer are both language models trained on overlapping corpora is not a quality gate. It is a confidence amplifier. The author writes code that looks plausible to a transformer, the reviewer reads code through the same plausibility lens, both agents converge on "looks fine," and the diff merges with a green checkmark that means nothing about whether the change is actually correct. Recent industry data shows the asymmetry plainly: PRs co-authored with AI produce roughly 40% more critical issues and 70% more major issues than human-written PRs at the same volume, with logic and correctness bugs accounting for most of the gap. The reviewer agents shipped to catch those bugs are, by construction, the ones least equipped to find them.

The teams getting real signal from AI code review have stopped treating "review" as a slightly different shape of "generation" and started designing review as a fundamentally different cognitive task. Generation prompting asks the model to produce something coherent. Review prompting has to ask the model to find what is missing — to inhabit the negative space of the diff rather than the positive one — and that inversion is much harder to elicit than a one-line system prompt suggests.

Your APIs Assumed One Human at a Time. Parallel Agents Broke the Contract.

· 12 min read
Tian Pan
Software Engineer

A backend engineer I know spent a Tuesday afternoon staring at a Datadog graph that had never spiked before: the per-user 429 counter on their internal calendar service. The customer complaining had not changed their behavior. They had simply turned on the assistant feature, which now spawned eight planning threads in parallel against the same calendar API every time the user said "find me time next week." The rate limiter — a perfectly reasonable 60 requests per minute per user, written years ago against a UI that physically could not click that fast — was firing within the first three seconds of every request and silently corrupting half the assistant's responses.

The rate limit was not the bug. The contract was the bug. That backend, like most internal services written before 2024, had a quietly enforced assumption baked into every layer: one user means one stream of activity, paced by a human's reaction time, with one cookie jar, one CSRF token, and one set of credentials that could be re-prompted if anything went wrong. Agents shred all five of those assumptions at once, and the failures show up as a constellation of unrelated incidents — 429 storms, last-write-wins corruption, audit logs you can't subpoena, re-auth loops that hang headless workers — that nobody connects until the pattern is named.

The shorthand I have been using with platform teams is this: every backend you own has an undocumented contract with its callers, and that contract was negotiated with humans. Agents are now showing up to renegotiate. You can either do the renegotiation deliberately, in code review, or you can do it during your next incident.

The kWh Column Missing From Your Inference Span: Carbon Attribution Per Request

· 10 min read
Tian Pan
Software Engineer

Your inference flame graph has a cost axis. It does not have an energy axis. That gap is fine right up until the morning a customer's procurement team sends you a spreadsheet with twenty-three columns of vendor sustainability disclosures, and one of them is kgCO2e per 1,000 inferences. You have no way to fill that cell, your provider's answer is a methodology paper, and the deal closes in nine days. The token-cost dashboard your platform team has been polishing for two years suddenly looks like it was solving the wrong problem.

The shift here is not abstract. Sustainability disclosure is moving from corporate aggregate to product-level granularity. The first wave of that movement landed inside CSRD and ESRS in 2025, and the second wave is landing in B2B procurement contracts right now. Engineering organizations that built observability for cost are about to discover they need observability for carbon, and the two are not the same column on the same span.

The Contestability Gap: Engineering AI Decisions Your Users Can Actually Appeal

· 11 min read
Tian Pan
Software Engineer

A user opens a chat, asks for a refund, gets "I'm sorry, this purchase is not eligible for a refund," closes the tab, and never comes back. Internally, the agent emitted a beautiful trace: tool calls, intermediate reasoning, the policy bundle it consulted, the model version it ran on. Every span landed in the observability platform. None of it landed anywhere the user could reach. There is no button labeled "ask a human to look at this again," and even if there were, there is no service behind it. The decision is final by default, not by design.

This is the contestability gap, and it is the next thing regulators, lawyers, and angry users are going to rip open. It is also one of the cleanest examples of a problem that looks like policy from the outside and turns out to be plumbing on the inside.

Debate Diversity Collapse: When Three Agents Vote 3-0 Because They Read the Same Internet

· 11 min read
Tian Pan
Software Engineer

The architecture diagram says "ensemble of three frontier models, debate-and-reconcile, majority vote." The trace says all three agents converged on the same answer in round one and spent two more rounds politely paraphrasing each other. The eval says +0.4 points over a single call. The bill says 4.2x. Somewhere in there, somebody decided the panel was working.

Multi-agent debate is sold as a way to get disagreement-driven reasoning: three minds arguing toward a better answer than any one of them would reach alone. It depends on the agents actually disagreeing. Frontier LLMs trained on overlapping web corpora, instruction-tuned against overlapping preference datasets, and aligned against overlapping safety taxonomies share priors more than the architecture diagrams admit. After a round of "let's reconcile," what you observe is not three perspectives converging on truth — it is three samples from one distribution converging on the mode they were never that far from.

The pattern has a name in the recent literature: when an ensemble's vote-disagreement rate trends to zero independent of question difficulty, you have debate diversity collapse. The panel is still voting. The vote no longer carries information.

DLP Belongs in Your AI Gateway, Not Bolted Into Every App

· 11 min read
Tian Pan
Software Engineer

The first internal LLM gateway is almost always built for the boring reasons: cost attribution so finance can answer "which team spent the inference budget," rate limiting so one runaway script doesn't burn the monthly quota, provider failover so an OpenAI hiccup doesn't take down the assistant. Data loss prevention shows up on the slide deck, but it ships as "each app team should redact sensitive fields before they call the model." Six months later there are nine apps in production, three half-maintained redaction libraries with subtly different regex sets, two prototypes that bypass the gateway entirely "just for testing," and a customer-data-in-prompt incident that everyone's middleware was supposed to prevent because nobody's middleware was the canonical egress point.

This is not a tooling problem. It is an architectural mistake. DLP is an egress control, and egress controls only work when the path is mandatory. The moment you let app teams own redaction, you've ceded the property that makes DLP function — that there is exactly one place sensitive data can leave, and you can prove what crossed it. The 2025 LayerX Security report puts the scale of the problem in numbers most teams haven't internalized: GenAI-related DLP incidents more than doubled in early 2025 and now make up 14% of all data-security incidents across SaaS traffic, with employees averaging 6.8 pastes into GenAI tools per day, more than half of which contain corporate information. The shadow path is winning by default.

The Dual-Writer Race: When Your Agent and Your User Edit the Same Calendar Event

· 12 min read
Tian Pan
Software Engineer

The agent confidently reports "I rescheduled the meeting to Thursday at 3pm." The user is staring at the original Tuesday 10am slot, because between the agent's plan and its commit they edited the event themselves. Last-write-wins overwrote a human change with an automated one, and the user's trust in the assistant collapses on a single incident. This is the dual-writer race, and it is the bug class that agent toolchains were never designed for.

Most agent platforms inherit this category by accident. The tool layer treats update_event as a single function call: take an ID, take new fields, return success. The provider API underneath has offered optimistic concurrency primitives — ETags, version tokens, If-Match preconditions — for a decade, and almost nobody plumbs them through. The model has no way to know that the world it reasoned about a minute ago is no longer current, because the abstraction it was given silently throws that information away.