Skip to main content

Who Pays for the Model's Mistake: Designing Liability Into Agent Products

· 9 min read
Tian Pan
Software Engineer

An agent books the wrong flight. It sends an apology email to the wrong customer. It writes a database migration that drops a column three services still read from. In each case the model produced a plausible-looking action, executed it, and moved on. And in each case somebody absorbed a real cost — a rebooking fee, a damaged relationship, an incident bridge at 2 a.m.

Here is the uncomfortable part: most AI products have no answer for who that somebody is. The question never comes up in the design review. It surfaces later, one ticket at a time, in a support queue where an agent improvised a $40 credit because the customer sounded angry and the rep had no policy to point at. Multiply that by a few thousand tickets a month and the unit economics quietly rot — not from a dramatic failure, but from a slow leak nobody scoped.

"The model made a mistake" is not a support escalation. It is a billing event. And the products that survive the agentic era will be the ones that designed for that event before the first angry ticket, not the ones that improvised refunds by vibes until the gross margin went negative.

The Liability Model You Should Have Built at MVP

The first thing to internalize is that "the agent was wrong" is not one failure mode. It is at least four, and they have different owners and different costs.

  • User error. The user gave the agent a bad instruction — wrong date, ambiguous name, an account they didn't actually own. The agent did exactly what it was told.
  • Model error. The instruction was clear and the tools were available, but the model reasoned poorly: it hallucinated a policy, picked the wrong tool, or misread a result.
  • Tool error. The model's plan was sound, but a downstream API returned stale data, timed out, or silently changed its contract.
  • Ambiguous instruction. Nobody was wrong, exactly. The request genuinely supported two readings and the agent picked the one the user didn't mean.

A liability model is a written rule that, for each of these four categories, assigns the cost to a party: the user eats it, the company eats it, the tool vendor eats it (via SLA credits), or it gets split. This is not a legal document. It is a product spec. It belongs in the same doc as your pricing tiers, and it should exist before you ship.

Most teams skip it because at MVP the volume is low enough that case-by-case improvisation works. That is exactly the trap. The improvisation feels free because each individual refund is small. What it actually does is train your support org to treat every mistake as a negotiation, which means your refund rate is now a function of how persistent your customers are rather than what actually went wrong. You cannot price that. You cannot forecast it. And you cannot tell an investor what your real margin is.

The classification also has to be recoverability-aware, not just fault-aware. A wrong-flight booking caught within the 24-hour cancellation window costs almost nothing. The same mistake caught two days later costs the full fare. Same fault, two orders of magnitude apart in cost. Your liability model needs a recoverability axis or it will misprice half its cases.

The Product Surfaces That Make Liability Tractable

A liability model is a policy. It only becomes operable if the product produces the evidence to apply it. Three surfaces do most of the work.

Action-level provenance trails. When a dispute lands, somebody has to reconstruct what happened — and "the agent did something" is not a reconstruction. You need a timestamped, structured record of every action the agent took, grouped by task, with the inputs it saw, the tool it called, the result it got, and the reasoning trace that connected them. Without this, every dispute defaults to a refund, because the cheapest way to close an un-reconstructable ticket is to pay it. Provenance is what lets you say "the user supplied this account number" and decline. It converts arguments into lookups.

Reversibility tiers. Not every action should be equally easy for the agent to take. Classify each tool the agent can call by how hard it is to undo: trivially reversible (draft a message, stage a change), reversible with effort (cancel a booking inside a window, issue a compensating transaction), and effectively irreversible (send an external email, move money, run a destructive migration). The tier should change the product's behavior — irreversible actions get a confirmation step, a human approver, or a hard cap. The mistake you can roll back is an inconvenience. The mistake you can't is a liability. Pricing them the same is how a $0.02 inference call turns into a $400 chargeback.

Insurance-style buffers. Once you can estimate your expected mistake cost — see the next section — you can price it in. A small per-seat or per-action buffer that funds a "mistake pool" turns the refund line from an unpredictable margin leak into a budgeted, forecastable cost of goods sold. This is just what payment processors and shipping carriers already do. The buffer also gives support a real instrument: they are spending from a defined pool against a defined policy, not negotiating against the company's gross margin.

Estimating the Expected Cost of Being Wrong

Your eval suite probably reports accuracy. Accuracy is the wrong number for this problem, because it treats every error as equal. A 2% error rate where the errors are typos is fine. A 2% error rate where the errors wire money to the wrong account is an existential threat. The two are indistinguishable on an accuracy dashboard.

The number you actually want is expected mistake cost per task class: for each kind of task the agent performs, the probability it fails times the dollar cost when it does. This reframes evaluation from a pass/fail exercise into a risk-pricing exercise, and it changes what you do with the results.

It tells you where to spend your reliability budget — not on the task class with the most errors, but on the one with the highest expected cost, which is often a low-frequency class that nobody was watching. It tells you which task classes should not be fully autonomous yet, because their expected cost exceeds what any buffer can absorb. And it gives pricing a real input: if a task class carries forty cents of expected mistake cost, the price has to clear that before anything else.

This is also where the eval-to-production gap bites hardest. Mistake costs are not stationary. A pricing change, a new integration, a shift in your customer mix can move the cost of a given failure by an order of magnitude without changing the failure rate at all. The expected-cost number has to be recomputed against live traffic on a schedule, or it silently becomes fiction — a frozen estimate of a risk that has already moved.

Internal cost-allocation is the common case. But once an agent's output influences money or contracts, the question stops being "which budget eats this" and becomes "are we legally on the hook," and the answer has been settled more firmly than most builders realize.

In Moffatt v. Air Canada, an airline chatbot told a grieving customer he could claim a bereavement discount retroactively — a policy that did not exist. Air Canada argued the chatbot was a separate entity responsible for its own statements. The tribunal rejected that outright: the chatbot was part of Air Canada's website, and the company was liable for everything it said. The damages were small. The precedent was not. You do not get to disclaim the agent. If it speaks for you, it is you.

The practical consequence for builders: any agent action that touches a price, a refund, a contract term, or a regulated claim needs a paper trail strong enough to survive external review — not just internal debugging. That is a higher bar than an observability log. It means the provenance record is durable, tamper-evident, and complete enough that a third party can reconstruct the decision without trusting your word for it. Treat it the way a payments company treats a transaction ledger, because functionally that is what it has become.

This also reframes the human-in-the-loop decision. A confirmation step is not just UX friction-reduction or a safety net. It is the moment liability transfers. When a user explicitly approves an irreversible action, the audit trail records who accepted the risk. Skip that step to make the demo smoother and you have not removed the liability — you have silently assigned all of it to yourself.

A Mistake Is a Billing Event

The shift in mindset is small to state and large to implement: stop treating agent errors as support escalations and start treating them as billing events with a defined owner, a recorded cost, and a budgeted source of funds.

Concretely, that means four things go into the product before launch, not after the first bad month. A written liability model that classifies failures by fault and recoverability. Provenance trails detailed enough to reconstruct any disputed action. Reversibility tiers that make irreversible actions structurally harder for the agent to take alone. And an expected-mistake-cost estimate, refreshed against live traffic, that feeds both pricing and the buffer that funds refunds.

None of this is glamorous. It will not appear in a launch demo. But it is the difference between a product whose margins you can forecast and one whose margins are quietly set by how angry your customers are willing to get. The agentic products that last will be the ones that answered "who pays for the model's mistake" on a whiteboard — before the queue answered it for them, one improvised credit at a time.

References:Let's stay in touch and Follow me for more thoughts and updates