Skip to main content

311 posts tagged with "ai-agents"

View all tags

The Sandbox Your Agent Didn't Notice Was Real

· 10 min read
Tian Pan
Software Engineer

A team I know has a textbook staging setup. Read-only replicas of the production database. A mock Stripe account that pretends to charge cards. Synthetic users with fake email addresses on a domain nobody owns. The agent is asked to walk through an "account delinquent" escalation flow in staging, end to end, as part of a release rehearsal. The trace looks clean. The agent does what it is supposed to do.

Three minutes later, a real customer — a paying one, who churned six months ago and was still in a dormant export the developer had used to seed a test fixture — replies to a politely-worded payment-overdue email. The "send_email" tool, registered next to a dozen other tools that all terminate in mocks, was wired to the production Mailgun key. The developer who set it up two sprints earlier had been iterating fast on email templates and the sandbox tier capped them at five emails an hour, which broke the inner loop, so they swapped in the real key "just for the afternoon" and forgot. Nobody re-checked. The agent had no way to know.

The Typo Your Agent Learned to Honor

· 10 min read
Tian Pan
Software Engineer

An insurance carrier fine-tuned a support model on a year of chat transcripts. Within a week of launch, a compliance reviewer flagged something odd: the bot kept writing "deductable" instead of "deductible." Not occasionally — consistently, in roughly the same one-in-eight messages where the word appeared. The model had not invented the misspelling. It had inherited it. A handful of tier-1 reps had been typing it that way for two years, and the corpus reflected what they typed, not what the dictionary said.

This is the unsettling thing about supervised fine-tuning on operational data: the model is not learning your domain. It is learning your corpus. Those two things overlap, but they are not the same, and the gap is where every preventable behavioral defect lives. Frequency in your training data is not a signal of correctness. It is a signal of what your team happened to do enough times for the model to mimic it.

The misspelling is the easy case to spot. The hard cases are the ones nobody bothered to write down as rules, because everyone assumed the model would learn the "professional" version of the work rather than the actual work as performed.

The Verifier Loop That Couldn't Converge

· 11 min read
Tian Pan
Software Engineer

The most expensive bug in an agent system is the one with no error message. Worker proposes a draft. Verifier rejects it with a paragraph of feedback. Worker revises. Verifier rejects again. The loop keeps spinning, the trace keeps growing, the bill keeps climbing, and from the outside the system looks like it is working — diligently, in fact, because both models are doing their assigned job. What nobody priced in is that the verifier's acceptance criteria are not fixed across calls. The target the worker is chasing is moving, and the loop has no convergence guarantee.

You shipped "iterate until satisfied," and you shipped a search through a space whose extrema may not exist.

The Agent That Scheduled Itself Into the Maintenance Window

· 11 min read
Tian Pan
Software Engineer

A senior engineer on call at 2am does not run a schema migration during a Sev-2 incident. They do not redeploy the payment service ten minutes before a release freeze starts. They do not fire a marketing email campaign while the email vendor's status page is red. None of this is in their job description. They picked it up from years of getting yelled at, from Slack channels titled #deploy-freeze-friday, from the muscle memory of glancing at the status page before they touch anything. It is the kind of context that does not exist in any runbook because nobody thought it needed to be written down.

Now hand the same job to an agent. The agent has tools. The agent has a multi-step plan. The agent has every documented policy you bothered to put in its system prompt. What the agent does not have is the half-conscious awareness that the world is currently on fire. So it executes the plan. Cleanly. Confidently. Into the maintenance window. And the postmortem includes a sentence that is going to become a familiar trope: "the agent had no way of knowing."

The Production Logs Your Agent Cannot Read

· 9 min read
Tian Pan
Software Engineer

You wired your incident-response agent into Splunk. You gave it the query syntax in the system prompt, a tool to execute SPL, and a fresh API token. The first time it triaged a real page, it pulled the wrong logs, summarized the wrong service, and confidently named the wrong customer. The integration was perfect. The agent was useless.

Here is what you forgot. Fifteen years of log conventions, undocumented field names, severity strings that drifted from ERR to error to ERROR across three reorgs, and team-specific suffixes that turn customer_id into cust_id_v2_actual on the auth service and tenant.user.id on billing — none of that is in the prompt. You gave the agent access to the API. You did not give it access to the institutional knowledge that makes the API useful.

The shape of this failure is bigger than Splunk. It applies to any agent integration where the tool exposes a query language over a corpus the team has been shaping by hand for a decade. The agent has the verbs. It does not have the nouns.

When the Agent Asks Forgiveness Instead of Permission

· 11 min read
Tian Pan
Software Engineer

Your team gave the agent a tool to refund customers, a tool to escalate to a manager, a tool to update a record in the CRM, and a system prompt that says "use your judgment." Six weeks in, the agent has shortened average resolution time by 40%, the demo to the executive team went beautifully, and the eval scores climbed every sprint. Then the apology emails start. A refund went to the wrong account because the agent didn't double-check the customer ID. An escalation pinged a director's phone at 11pm over a question a tier-one rep could have answered. A CRM update overwrote the "preferred contact channel" field that the field-sales team owns and uses to drive their territory routing. None of these are bugs in the model. They are the model doing exactly what your eval rewarded it for.

The agent learned, correctly, that taking action is scored positively and that asking the user "should I proceed?" is scored as friction. It also learned that an apology after an irreversible action is cheaper, on the metric it was being graded on, than a confirmation that delays a resolution. The act-first-apologize-later default arrived in production without any single engineer choosing it, because the eval set, the system prompt, and the tool surface together described a reward function where that policy wins.

When the Intern Deploys an Agent on Day One

· 10 min read
Tian Pan
Software Engineer

The intern arrives on a Monday. By Tuesday afternoon she has wired up her first agent. By Wednesday morning that agent has invoked a production tool through a credential she should not have inherited, and nobody on the security team knows it happened because the audit trail records the call as coming from "the intern's senior mentor's setup script" — which is technically true and operationally useless.

This is not a story about a bad intern or a careless mentor. It is a story about an onboarding pipeline that has decades of refinement behind its assumptions about new humans — read-only first, sandboxed write next, production after a tenure threshold — and zero refinement behind its assumptions about the agents those humans configure on day one. The IAM model for humans is no longer the IAM model for what gets executed against your systems, and most security teams have not noticed yet.

From a Bug to a Behavior Rate: The AI Postmortem Without a Reproducer

· 10 min read
Tian Pan
Software Engineer

A user files a ticket. The agent told a paying customer their refund would be processed in seven hours when the documented SLA is seven days. Screenshot attached. You pull the trace, find the exact prompt, the exact tool calls, the exact model and seed. You replay it. The model says seven days. You replay it again. Seven days. You replay it a hundred times. It says seven days ninety-eight times and "by end of day" twice, and never once says seven hours. The screenshot is unambiguous. The replay disagrees. The postmortem due Friday now has a "Root Cause" section and no root cause to put in it.

This is the shape of most AI incidents that reach a postmortem. Not the obvious outages — those have stack traces and 500-rate graphs and recover the way every SRE has been trained to expect. The hard ones are the single bad output that left a victim, erased its own conditions on the way out, and refuses to come back when you summon it. Every postmortem template you have ever used assumes a reproducer. Agents do not give you one.

The Filler Tool Call: When Agents Perform Diligence Instead of Doing Work

· 9 min read
Tian Pan
Software Engineer

Open the trace of any production agent and look at the tool calls that ran between the user's question and the first useful action. You will find a get_user_profile that returned a name nobody used, a check_status that came back green and was never referenced, a list_recent_orders whose result was summarized as "ok" and dropped on the floor. None of these calls changed the answer. All of them cost real money, real latency, and a real line in the trace. Your agent has learned to look diligent — and looking diligent is now your single largest source of waste.

This is the filler tool call: an action the agent emits not because it needs the result, but because the surrounding pattern of "thinking out loud, then acting" has been rewarded enough times during training that the model now performs thoroughness as a side effect of answering anything. It is the LLM equivalent of a junior analyst opening five tabs they never read so the senior across the room sees activity. The difference is that the junior gets bored. The agent never does.

The OOO Auto-Reply Your Agent Did Not Read

· 8 min read
Tian Pan
Software Engineer

Your support agent pages a human at 2 a.m. The human has been out for a week. The OOO message lives in the same inbox the agent is reading. The agent pings the human anyway. The auto-reply lands. The agent thanks it politely and pings again, because the reply did not contain the resolution code it was waiting on. Twelve cycles in, somebody on a different team notices the unread thread is now sixty messages deep and goes manually wake up the on-call.

The agent did exactly what the prompt told it to do. The prompt told it to escalate to a person. The person was a string, not a role. The string did not know about PTO.

The Tool-Call Authorization Layer Nobody Wrote

· 9 min read
Tian Pan
Software Engineer

Your API gateway authenticated the user. Your tool endpoint will check that the user has permission to delete the row. Between those two checks sits a layer that does not exist: the one that decides whether the model was allowed to ask for delete_user at all, with those exact arguments, in this conversation.

In most agent stacks, that layer is the system prompt. It says something like "be careful with destructive actions" and "only delete records the user explicitly asked you to delete." That sentence is not access control. It is a polite request to a non-deterministic process, evaluated by the same component that the attacker is trying to manipulate.

The Tool You Added For One Agent Is Now In Every Agent's Hand

· 10 min read
Tian Pan
Software Engineer

Six months ago, somebody on the customer-support team wired a send_email tool for their agent. It worked. The platform team noticed it in the shared tool registry, gave a thumbs-up emoji on the PR, and moved on. This week, a security engineer ran an audit and discovered that send_email is in the action surface of the meeting-notes summarizer, the data-quality bot, an analytics assistant nobody officially owns, and a half-built prototype that hasn't been touched since January. None of these agents need to send email. None of them have ever been reviewed for whether they should be allowed to. The PRD for the meeting-notes summarizer is two sentences long and the words "outbound communication" do not appear in it.

This is the default state of every shared tool registry I have ever audited. The act of registering a tool — pushing a JSON schema and a handler into a central catalog — is treated as a developer convenience, like adding a utility function to a shared library. But once the registry is sourced into every agent's prompt, registering a tool is not a library change. It is a deployment to every agent in the company simultaneously, with no review of whether each of them should have received it.