Skip to main content

The Idempotency Key Your Agent Never Sent

· 11 min read
Tian Pan
Software Engineer

A customer once got refunded three times for a single return. Not because the model hallucinated a policy, not because a human fat-fingered a form — because the refund tool timed out twice, the agent retried both times, and every retry carried a fresh request with no way for the payment processor to know it had seen this work before. Three clean HTTP 200s. Three real movements of money. The agent did exactly what it was told: when a call fails, try again.

The bug was not in the model. The bug was in a header that was never sent.

Retrying is the single most natural thing an agent does. A tool call returns an error, or worse, returns nothing at all, and the loop's instinct — encoded in the framework, the prompt, or the model's own training — is to try the action again. That instinct is correct for reads and catastrophic for writes. The difference between a resilient agent and one that double-charges customers is not intelligence. It is whether every state-changing tool call carries an idempotency key, and whether the system on the other end actually honors it.

A Timeout Tells You Nothing

Start with the fact that breaks everyone's mental model: when a network call times out, you do not know whether the work happened.

It is tempting to read a timeout as failure. It is not. A timeout means the response did not arrive in time. The request may have never reached the server. It may have reached the server, executed fully, and then the response packet got lost on the way back. It may be executing right now, slowly, and will finish thirty seconds after your client gave up. From the client's side, all three look identical. There is no field in the error you can inspect to disambiguate them.

This is the Two Generals Problem wearing an HTTP costume, and it is not a bug you can fix with a better library. It is an impossibility result. You cannot build a channel over an unreliable network that guarantees the sender knows, with certainty, whether the receiver acted. Every distributed system lives with this. Exactly-once delivery — the thing everyone wants — is provably impossible. What the industry actually ships is at-least-once delivery paired with idempotent processing, which together produce an effect indistinguishable from exactly-once. The "exactly-once" you have heard about is a property of the receiver, not the channel.

An agent that retries on timeout is therefore doing something dangerous by default. It is taking an event that means "I don't know" and treating it as an event that means "it didn't work, do it again." When the underlying action was a read, no harm done — you fetched the same record twice. When the action was place_order, send_email, issue_refund, or transfer_funds, the retry is a second, real, independent side effect. The model's reasoning was sound. The world now has two orders in it.

Retries Are the Default; Duplicate Side Effects Are the Consequence

Look at how agents are actually built and you find retries everywhere, often three or four layers deep, none of them aware of the others.

The HTTP client retries on connection errors. The agent framework retries tool calls that return errors. The orchestration layer retries a whole step if the step "fails." And the model itself, seeing a tool result that looks like an error, will frequently just call the tool again as its next action — a retry that no infrastructure setting controls, because it is a token-level decision. A single timed-out charge_card call can fan out into four real charges before anyone notices, each layer retrying the layer below it.

This is not a misconfiguration. It is the emergent behavior of a stack where every layer was independently designed to be "resilient," and resilience was defined as "try again." Each layer is locally correct. The composition is a duplicate-side-effect generator.

The reason this survives code review is that the happy path and the failure path produce visually identical output. A successful send_email returns a 200. A retried-after-timeout send_email that sends a second copy also returns a 200. The agent transcript looks clean. The eval suite passes. The duplicate only shows up in the customer's inbox, the billing dashboard, or the support ticket three days later. You cannot catch this by reading agent logs, because the logs of a correct run and a double-execution run are the same logs.

So the consequence has to be designed out, not tested out. And the place to design it out is the boundary between the agent and the systems it mutates.

Exactly-Once Is a Tool-Contract Problem, Not a Model Problem

Here is the framing that matters: you will never make the model stop retrying, and you should not try.

Retrying is correct behavior in the face of ambiguity. The model cannot tell a lost-response timeout from a real failure any better than the network can — it is on the wrong side of the same impossibility. Prompting the model to "be careful about retries" is asking it to solve the Two Generals Problem with better intentions. It cannot. Neither can you.

What you can do is make the retry harmless. That is what an idempotency key does. The client — your agent's tool layer — generates a unique identifier for a unit of intended work and attaches it to the request. The server records that identifier the first time it processes the request, along with the result. When a request arrives bearing an identifier the server has already seen, it does not execute again. It returns the stored result of the first execution. The second issue_refund call comes back with the same refund ID, the same amount, the same 200 — and zero additional money moved.

Notice what this does to the timeout problem. The agent retries because it does not know whether the first call worked. With an idempotency key, it no longer needs to know. If the first call succeeded, the retry returns that success. If the first call never landed, the retry executes for real. Either way the outcome is one refund. The ambiguity is not resolved — it is rendered irrelevant. That is the whole trick. You do not eliminate uncertainty; you build a system where uncertainty does not matter.

This is why it is a tool-contract problem. The guarantee lives in the agreement between caller and callee: this endpoint accepts an Idempotency-Key, and it promises that two requests with the same key produce one effect. The model is not party to that contract and does not need to be. The tool wrapper is.

Generating a Key That Survives a Retry

The mechanics are where teams quietly get it wrong, so be precise about three things.

The key must be stable across the retries it is meant to dedupe. If your tool wrapper generates a fresh UUID every time it builds the request, you have a unique key per attempt, which deduplicates nothing — each retry looks new to the server. The key must be generated once, when the agent first decides to take the action, and reused for every retry of that decision. A good source is a deterministic hash of the things that do not change on retry: the workflow run ID, the step index, the action type, and the salient arguments. Same decision, same key. New decision, new key.

The key must be distinct across actions that are genuinely different. This is the opposite failure. If two real, intended refunds to the same customer for the same amount hash to the same key, the second one is silently swallowed as a "duplicate" and the customer is shortchanged. The key has to capture intent, not just parameters. A step index or a monotonically increasing decision counter inside the run is what separates "the same refund, retried" from "a second refund that happens to look alike."

The dedupe scope must match the blast radius. A key stored in your agent's process memory protects you against retries within one run. It does nothing if the whole agent run itself is retried by an orchestrator after a crash, because the new process starts with an empty memory. For anything that moves money or contacts a human, the idempotency record has to live in durable storage the surviving retry can see — the downstream API's own dedupe store if it offers one, or a database table you own if it does not. The question to ask of every key is: which retries does this actually catch, and which sail straight past it?

One more constraint, easy to miss: do not cache failures the way you cache successes. If a request fails with a transient 500 or a timeout, the next retry should be allowed to genuinely execute — that is the entire point. Idempotency stores should record completed, successful outcomes and let retries through for everything else. Cache a 500 against its key and you have built a system that permanently refuses to ever complete the work.

Reconciliation for What Slips Through

Idempotency keys are necessary and they are not sufficient. Some side effects will still escape, and a mature design plans for that instead of pretending it won't.

The gaps are real. A tool you do not own — a partner API, a legacy internal service, an email provider — may simply not support an idempotency header, in which case your key is a string nobody reads. The dedupe window may expire: many providers honor keys for 24 hours or 7 days, and a retry that arrives after the window is a fresh request again. And there is always the irreducible race — two retries land on the server within the same millisecond, before the first has finished writing its idempotency record, and both execute.

For these, you need reconciliation: a process, running on its own schedule, that compares intended effects against actual effects and flags the divergence. Did this workflow run intend one refund and produce two? Reconciliation is how you find out, hours later, without a customer having to tell you. It is the audit layer underneath the idempotency layer, and the two are not redundant — idempotency prevents the common case cheaply, reconciliation catches the long tail that idempotency structurally cannot.

This also argues for a design discipline: make state-changing tools report back what they did in a form you can reconcile against. A place_order tool should return the order ID it created. A send_email should return the message ID. If a tool's only output is "success," you have nothing to reconcile with — you cannot tell one order from two. Provenance on the way out is what makes the audit on the way back possible.

What to Actually Do

If you operate an agent that touches real systems, the checklist is short and not optional.

  • Classify every tool as read or write. Reads can retry freely. Writes cannot retry safely without an idempotency key. This classification belongs in the tool definition, not in a wiki.
  • Generate the key at decision time, not request time. One key per intended action, reused across every retry of that action, derived from inputs that are stable on retry and distinct across genuinely different actions.
  • Store the dedupe record where the surviving retry can see it. In-process memory only covers in-process retries. Anything that can be retried after a crash needs a durable, shared idempotency store.
  • Never cache a failure as if it were a success. Transient errors must leave the door open for a real retry.
  • Have every write tool return an identifier, and run a reconciliation job that compares intended writes to actual writes.

The uncomfortable truth is that none of this is new. Payment systems, message queues, and cloud control planes have run on idempotency keys for two decades, because they learned the hard way that a network gives you at-least-once and the rest is your problem. Agents did not invent the duplicate-side-effect bug. They just retry faster, in more places, with less human in the loop to notice the second charge — and they will keep doing it, correctly, until the tool on the other end is built to forgive them.

References:Let's stay in touch and Follow me for more thoughts and updates