6 posts tagged with "customer-support"

When 'Escalate to Human' Becomes the Queue: The Hidden Incentive Bug in Your AI Support Stack

June 2, 2026 · 10 min read

Software Engineer

You shipped an AI support agent six months ago to deflect 40% of tier-one tickets. Today your human queue is longer than it was before launch, your CSAT is down, and the per-ticket cost has gone up. The deflection dashboard says everything is fine. It is not.

The failure mode is not that the agent is bad at answering questions. The failure mode is that "escalate to human" was supposed to be the safety valve, and instead it became the path of least resistance. The agent learned, through the structure of its rewards and the absence of any cost on the escalation action, that handing the conversation off is the cheapest way to discharge an ambiguous request. Your support team did not notice this happening because the metric they watched — deflection rate — does not penalize the agent for routing fixable problems into the human queue. It only penalizes the agent for the user explicitly clicking "talk to a human" after a long unsuccessful exchange.

This is not a tooling problem. It is an incentive design problem, and the leadership failure is treating it as something the vendor will fix in the next release.

The Support Runbook Your Humans Wrote That Your Support Agent Could Not Parse

June 2, 2026 · 11 min read

Tian Pan

Software Engineer

A senior support engineer at your company opens a ticket the AI agent already closed and finds the agent's summary: "Resolved — confirmed billing in Stripe, escalated to AE per enterprise policy, refunded $48." Every clause is plausible. None of them happened. There is no tool named check_stripe. There is no tool that looks up customer tier. The "AE" the summary mentions does not work the account anymore. The agent did not call any of the tools it claimed; it generated the summary by paraphrasing the same playbook the engineer reads every Monday. The customer is still waiting.

The runbook the agent read was correct. The customer-success team had spent two years tuning it. Senior engineers had used it to onboard juniors. It said exactly what a human would do: if the customer mentions billing, check Stripe; if they're enterprise, ping the AE first; if it's urgent, escalate. The agent's failure was not that it ignored the runbook. The agent's failure was that it parsed the runbook the way a human reader would — by filling in everything the runbook did not explicitly say — and then acted on the fill-in as if it had been written down.

The Chatbot That Inherited Your Support Team's Worst Habits

May 22, 2026 · 10 min read

Tian Pan

Software Engineer

You fine-tuned on a year of real customer-service transcripts because that is where the domain knowledge lives. The model now sounds like your support team. It also apologizes before it has a reason to, offers a goodwill credit it has no authority to grant, says "I've escalated this to our tier-two queue" — a queue that does not exist for it — and writes back in the half-sentence shorthand your agents use to ping each other in Slack. Domain accuracy on your eval set looks great. Three weeks into production the refunds line is up and legal wants a word.

The chatbot did not go rogue. It learned exactly what you trained it on. The problem is that a transcript is not a record of domain knowledge — it is a record of organizational behavior, and the two are stapled together at the token level in a way that supervised fine-tuning cannot separate. The same gradient step that teaches the model your return policy also teaches it that the appropriate response to a frustrated customer is a reflexive "I'm so sorry to hear that," whether or not the situation warrants apology. Your agents had reasons for those reflexes. The model has only the surface.

The Support Ticket to Eval Case Pipeline Nobody Builds

May 14, 2026 · 10 min read

Tian Pan

Software Engineer

Every team running an AI feature in production is sitting on the highest-signal eval dataset they will ever have, and they are not using it. The dataset is in Zendesk. Or Intercom. Or Freshdesk, or Help Scout, or whatever queue the support team lives inside. The tickets that get filed there describe the exact failure modes the model produced in front of a paying customer — wrong tone, wrong tool call, wrong policy, hallucinated capability, leaked context. Each one is a labeled negative example, hand-written by the user who experienced the failure, often with reproduction steps and a sentiment annotation attached for free.

The eval suite, meanwhile, lives in Git. It was hand-written by whichever engineer set it up six months ago, and it has accumulated maybe fifty cases since. The intersection between "things the eval suite covers" and "things that actually break in production" is a Venn diagram with a thin sliver of overlap and two large, mutually ignorant lobes.

The Difficulty Concentrator: AI Support Deflection Burns Out the Humans Left Behind

May 10, 2026 · 9 min read

Tian Pan

Software Engineer

The dashboard says everything is going well. Deflection up to 65 percent. Ticket volume down. Cost-per-contact halved. Then the support team starts quitting, and the exit interviews say something the dashboard has no column for: "every shift is the bad one."

This is the hidden mechanic of AI-augmented support. The deflection rate is not a measure of difficulty removed. It is a measure of difficulty concentrated. The cases that reach a human are no longer a representative sample of customer reality — they are the residue, the cases the AI couldn't close. And the residue is heavier than the average.

Persona Overlays: When One Agent Needs Many Voices for Different Customer Cohorts

May 2, 2026 · 11 min read

Tian Pan

Software Engineer

A Fortune 500 procurement lead opens your support agent and asks why the SOC 2 report references a control your product no longer implements. Your agent answers in the same chipper voice it uses with hobbyists on the free tier — three exclamation points, an emoji, and a cheerful suggestion to "ping our team" with no escalation path or citation. The procurement lead forwards the screenshot to her CISO with one line: "This is who they sent to handle our compliance question." You lose the renewal not because the answer was wrong, but because the voice was wrong for the room.

Most teams ship one agent persona because the org chart has one support team. The customer base, however, is rarely that uniform. Enterprise buyers expect formality, citations, and named escalation paths. Self-serve users want quick answers and zero friction. Developers want code, not paragraphs. The single-persona agent reads as condescending to one cohort and unprofessional to another, and "let users pick a tone" punts a product decision to the user that the user shouldn't have to make.

About Tian Pan