The Idempotency Problem in Agentic Tool Calling
The scenario plays out the same way every time. Your agent is booking a hotel room, and a network timeout occurs right after the payment API call returns 200 but before the confirmation is stored. The agent framework retries. The payment runs again. The customer is charged twice, support escalates, and someone senior says the AI "hallucinated a double charge" — which is wrong but feels right because nobody wants to say their retry logic was broken from the start.
This isn't an AI problem. It's a distributed systems problem that the AI layer imported wholesale, without the decades of hard-won patterns that distributed systems engineers developed to handle it. Standard agent retry logic assumes operations are idempotent. Most tool calls are not.
Why Agent Retries Are Structurally Broken
Every major agent framework — LangChain, LlamaIndex, OpenAI's Agents SDK, Anthropic's Claude — includes automatic retry behavior for transient failures. That's correct. Transient failures happen constantly in distributed systems, and silently dropping a request is worse than retrying.
The problem is where the retry logic lives. Agent frameworks retry at the LLM layer: the model sees a tool call failed, decides to try again, and calls the tool again. Nothing in this loop tracks whether the tool already executed its side effects. The framework sees a timeout and reasons: "I didn't get a result, so I should try again." The tool may have already written to the database, charged the card, sent the email, or created the ticket.
This failure mode appears across industries. Agents managing CRM systems create duplicate support tickets from a single customer complaint. Inventory management agents double-deduct stock from the same order. Financial agents send duplicate refunds. Each case follows the same pattern: a timeout or transient error triggers a retry, the tool executes again, and the system ends up in a state the agent never intended.
The "fix the prompt" instinct kicks in at this point. Engineers add instructions like "only call the payment tool once" or "check if the order exists before creating it." This doesn't work. The agent that generated the duplicate charge was following instructions correctly — it genuinely didn't know the first call succeeded. The problem isn't the model's reasoning; it's the absence of external state that would let the tool report "I already did this."
What Idempotency Actually Means for Tool Calls
Idempotency means calling an operation multiple times produces the same result as calling it once. A GET request is naturally idempotent: reading a record never changes it, so reading it ten times is safe. A DELETE is idempotent in practice: deleting a record that doesn't exist returns the same logical result as deleting one that does. A POST that creates a record or charges a card is not idempotent by default — every call creates a new thing.
The pattern for making non-idempotent operations safe is well-established in payment APIs. When a client sends a request, it includes an idempotency key — a unique identifier the client generates and owns. The server stores the key alongside the operation result. On a retry with the same key, the server checks its store and returns the cached result without re-executing. The client gets a consistent response on the third retry as on the first.
For agent tool calls, this pattern requires explicit design decisions across three layers:
The agent runtime layer must generate and maintain idempotency keys per workflow step. The right key is derived from durable state: {workflowRunId}:{stepId} works well in practice. This ensures keys survive restarts and are deterministic — the same key is regenerated on resume, not a new one generated on retry.
The tool execution layer must pass the idempotency key to the downstream service, check the deduplication store before executing, and cache results with sufficient TTL. If the key exists and the previous call succeeded, return the cached response. If the key exists and the previous call failed with a permanent error, return that error without re-executing.
The tool interface itself must accept idempotency keys and implement the deduplication logic. Tools that call external APIs should pass the key through to those APIs. Tools that write to internal databases should use the key as part of a unique constraint.
When all three layers cooperate, the agent framework can retry as aggressively as it wants. The economic result — one charge, one record, one email — matches the agent's intent.
Saga Patterns for Multi-Step Workflows
Single-tool idempotency is the simpler problem. The harder case is multi-step workflows where an agent calls several tools in sequence and something fails partway through.
Consider an agent that processes an order: reserve inventory, charge the customer, send a confirmation email. Each step is idempotent in isolation. But if the payment succeeds and the confirmation fails, the customer has been charged without confirmation. If the agent retries the whole sequence, the payment runs again (assuming you implemented idempotency at that layer, the charge is deduplicated), but the inventory was already reserved — a second reservation attempt might fail if inventory is now zero.
This is the problem the saga pattern solves. A saga is a sequence of steps where each step has a corresponding compensating action that reverses its effects if a later step fails. Rather than atomic rollback (which requires distributed transactions and their associated costs), sagas implement eventual consistency through explicit compensation.
For the order processing workflow, the saga looks like this:
- Reserve inventory → compensating action: release reservation
- Charge payment → compensating action: issue refund
- Send confirmation → compensating action: send cancellation notice
If the confirmation step fails permanently, the saga executor runs the compensating actions in reverse order: issue refund, release inventory. The customer sees a failed order, not a charged order with no confirmation.
The critical implementation detail is that compensating actions must themselves be idempotent. Issuing a refund on retry should not issue a second refund. This sounds obvious until you're debugging a refund loop at 2am.
Two implementation patterns exist for saga execution. The orchestration model uses a central coordinator — often a durable workflow engine or a dedicated orchestrator agent — that directs each step and triggers compensation on failure. This gives clear visibility into workflow state but creates a single point of coordination. The choreography model has each step emit events that trigger the next step, with compensation triggered by failure events. This is more loosely coupled but significantly harder to observe and debug.
- https://docs.stripe.com/api/idempotent_requests
- https://stripe.com/blog/idempotency
- https://brandur.org/idempotency-keys
- https://microservices.io/patterns/data/saga.html
- https://learn.microsoft.com/en-us/azure/architecture/patterns/saga
- https://temporal.io/blog/idempotency-and-durable-execution
- https://blog.bytebytego.com/p/at-most-once-at-least-once-exactly
- https://composio.dev/content/apis-ai-agents-integration-patterns
- https://medium.com/@alexander.ekdahl/why-agent-frameworks-break-at-scale-ab01bf588b40
- https://techcommunity.microsoft.com/blog/educatordeveloperblog/ai-didn%E2%80%99t-break-your-production-%E2%80%94-your-architecture-did/4482848
- https://www.inferable.ai/blog/posts/distributed-tool-calling-message-queues
- https://dev.to/klement_gunndu/your-api-wasnt-designed-for-ai-agents-here-are-5-fixes-2oem
- https://www.vellum.ai/blog/the-ultimate-llm-agent-build-guide
