Skip to main content

The Contestability Gap: Engineering AI Decisions Your Users Can Actually Appeal

· 11 min read
Tian Pan
Software Engineer

A user opens a chat, asks for a refund, gets "I'm sorry, this purchase is not eligible for a refund," closes the tab, and never comes back. Internally, the agent emitted a beautiful trace: tool calls, intermediate reasoning, the policy bundle it consulted, the model version it ran on. Every span landed in the observability platform. None of it landed anywhere the user could reach. There is no button labeled "ask a human to look at this again," and even if there were, there is no service behind it. The decision is final by default, not by design.

This is the contestability gap, and it is the next thing regulators, lawyers, and angry users are going to rip open. It is also one of the cleanest examples of a problem that looks like policy from the outside and turns out to be plumbing on the inside.

The technical reason the gap exists is that production AI pipelines were optimized for the forward path. The agent reads a request, fetches some context, picks a tool, generates an output, and returns. The reasoning trace exists, but it was logged for the on-call engineer, not for the user. The input snapshot the model actually saw lives in one store; the policy bundle that gated the decision lives in another; the model version is a tag on the deployment, not a field on the decision. Asking "why was this user denied?" three weeks later usually means joining four logs and hoping the retention windows lined up. Asking "re-evaluate this case under different assumptions" almost always means running the same prompt against the same context and getting, predictably, the same answer.

What "Final" Looks Like When It Wasn't Supposed To Be

Walk the user-visible decision boundary in any AI-mediated product and you will find a long list of outputs that the team would, on reflection, classify as appealable — and a short list of UI affordances that actually let the user appeal. An agent declines a refund. A moderation pipeline removes a post. A content ranker buries a creator's video. An identity service flags an account as suspicious and forces a re-verification loop. A hiring tool quietly down-scores a resume. A recommender stops showing a merchant's products to the buyers who used to buy them.

Every one of those is "the model said so." Every one of them is also a decision the regulator now wants you to be able to explain and, in many cases, allow the user to contest. The EU AI Act's right to explanation under Article 86 entitles people affected by high-risk AI decisions to "clear and meaningful explanations of the role of the AI system in the decision-making procedure," and GDPR Article 22 has long required, for solely automated decisions with legal or similarly significant effects, that data subjects be able to obtain human intervention, express their point of view, and contest the decision. The wording is older than the latest model generation, but the obligation is unchanged: a real path back to a human, with a real chance of a different outcome.

Engineering teams tend to discover this requirement in the wrong order. First, they ship the agent. Then someone asks "what's the appeal path?" Then someone realizes there isn't one. Then someone proposes "we'll route to support." Then support points out that they have no input snapshot, no policy version, no record of what the agent actually saw — only a customer transcript that says "AI said no" and a confused human asked to overrule "the system" without knowing what "the system" decided. That conversation usually ends with the appeal getting upheld for the wrong reason or denied for the wrong reason, neither of which is contestability. It is just a coin flip with a friendlier voice.

The Three Things You Need Before You Need Them

Contestability is not a feature you can bolt onto a launched agent in an afternoon. It is three pieces of infrastructure, and skipping any one of them turns the appeal flow into theater.

The first is a per-decision durable record. Not a span, not a log line — a record. For every decision that crosses the threshold of "could affect a user's interests," you need a row that captures the full input snapshot the model saw (canonicalized so two runs on identical inputs hash identically), the model version and provider, the policy bundle or rule version that gated the output, the tool calls that were attempted and their results, and the final output as it was returned to the user. This record needs its own retention policy, decoupled from your hot observability storage. Twelve months is not enough; three to seven years is closer to where regulators are landing for high-risk decisions, and the AI Act's high-risk system audit trail expectations push that number up further. This record is the thing the auditor asks for when they show up. It is also the thing your second-look pipeline reads from when a user appeals.

The second is a user-facing appeal endpoint with an SLA. Not a contact form that lands in a help-desk queue with no decision identifier; a real endpoint, with a real schema, where a user (or their support agent on their behalf) can submit "I'd like decision <id> reviewed" along with new context the original decision didn't have. The endpoint creates an appeal case, links it to the durable decision record, and starts a clock. The clock matters. An appeal that has no SLA is an appeal that quietly dies in a backlog, and "we'll get back to you when we can" is the same outcome as "no" for any user who needs a fast resolution. Cove's appeal API is one example of this shape in the moderation space; the same idea generalizes to any class of decision your users care about. A POST to your appeal endpoint creates a case; a POST back to your callback (or your support tool) records the resolution; both are linkable to the original decision id forever.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates