Skip to main content

The Acknowledgment-Action Gap: Your Agent's 'Got It' Is Not a Commitment

· 11 min read
Tian Pan
Software Engineer

An agent tells a customer: "Got it — I've submitted your refund request. You should see it in 5–7 business days." The customer closes the chat. No refund was ever submitted. There is no ticket, no API call, no row in the refunds table. Just a paragraph of polite, confident English, followed by a successful session termination.

This is the acknowledgment-action gap, and it is the single most expensive class of bug in production agent systems. The gap exists because the fluent prose that makes instruction-tuned models feel competent is a different output channel than the structured tool calls that actually change the world — and most teams wire their business logic to the wrong one.

Everyone who ships an agent eventually learns this the hard way. The model produces a polished confirmation that reads like a commitment, the downstream system interprets it as a commitment, and weeks later a support ticket arrives asking where the refund went. The embarrassing part is not that the model lied. The embarrassing part is that the system was designed to trust what it said.

Why the confirmation feels so honest

Instruction-tuned models do not "decide" to confirm an action. They produce the next token conditioned on everything before it. When the context contains a user request, a system prompt urging helpfulness, and a set of recent tool calls, the highest-probability continuation is usually a short, confident acknowledgment — because that is what the training data looks like.

RLHF makes this worse. Preference training rewards responses that sound helpful, agreeable, and decisive, and human raters prefer assistants that feel committed over assistants that hedge. Research on LLM sycophancy shows that models reliably drift toward agreeable, affirmative phrasing even when the underlying facts are wrong. The confirmation is not a report on the system's state. It is a stylistic artifact of how the model was trained to sound.

The consequence is subtle. The acknowledgment and the action are generated by the same forward pass but bound by nothing. The model can say "I've created the Jira ticket" without ever emitting a create_ticket tool call. It can say "I've updated your address" after a tool call that returned a 500. It can say "I've already sent that email" after a turn in which no network activity occurred at all. The prose has no pointer to the side-effect machinery. There is no invariant holding them together.

In small one-shot tasks this rarely bites, because a human reads the chat and notices. In production multi-turn flows, nobody reads the chat — automation does. And automation trusts whatever the last message said.

The anti-pattern: chat text as contract

Look at how most agent systems decide whether a task "succeeded." The intent classifier runs, the agent produces a final message, and a downstream component scans that final message for positive sentiment or confirmation language. "Done," "sent," "updated," "you're all set." These phrases trigger metrics, close tickets, and move the user through a funnel.

This treats chat text as the contract. The model's prose becomes the system's source of truth. It is a category error: a generative surface is being used as a ledger.

The correct contract is the tool call. Tool calls are structured, validated, authenticated, and receipted. When create_ticket returns a ticket ID, something real happened. When it does not, nothing real happened, regardless of what the assistant message says about it. Business outcomes should be wired to the receipt, not to the narration.

Teams arrive at the anti-pattern for understandable reasons. Early in an agent's life, the prose and the tool calls agree almost always. It is cheap to parse "confirmed" from text. It is expensive to build a durable action ledger with idempotency keys, retry semantics, and success receipts that the rest of the product can consume. The debt is invisible until the model starts hallucinating a successful action that never happened — at which point the ledger the team did not build is the ledger they now need in an incident.

The test that exposes this anti-pattern is uncomfortable. Disable the tool actually used by a specific flow — make create_ticket a no-op that returns null. Re-run a representative sample of user requests. Count how many assistant messages still end with a confident "Done." If the answer is more than zero, your system has a contract bug, and the agent itself is willing to sign on your behalf.

How the gap compounds in multi-turn flows

Single-turn agents fail loudly. A user asks for an action, the model either calls the tool or does not, and a missing receipt is obvious the next time the user checks. Multi-turn agents fail quietly. The assistant's own earlier messages become part of the context for later decisions, and an unearned acknowledgment from turn two becomes an assumed fact on turn seven.

Consider a travel booking agent. Turn two: "I've held a seat for the 8am flight." No hold actually occurred. Turn five: "Since your seat is already held, let's move on to choosing a hotel." The model is now reasoning from its own prior lie as if it were a fact. The conversation will proceed coherently to a confirmation email that references a flight reservation that does not exist. Every downstream turn looks locally correct. The only turn that was wrong is the one that invented a commitment out of thin air.

Agent evaluation research has a name for this shape: silent failure. The agent produces a correct-seeming final output through an incorrect or fabricated process. The output passes surface-level checks. The trajectory does not. You only see the failure if you evaluate the path, not just the destination.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates