Agent Idempotency: Why Your AI Agent Sends That Email Twice

April 10, 2026 · 9 min read

Software Engineer

Your agent processed a refund, but the response timed out. The framework retried. The customer got refunded twice. Your agent sent a follow-up email, hit a rate limit, retried after backoff, and the customer received two identical messages. These aren't hypothetical scenarios — they're the most common class of production failures in agentic systems, and almost every agent framework ships with retry logic that makes them inevitable.

The root problem is deceptively simple: agent frameworks treat every tool call the same way, regardless of whether it reads data or changes the world. A get_user_profile() call is safe to retry a hundred times. A send_payment() call is not. Yet most frameworks wrap both in the same retry-with-exponential-backoff logic and call it "reliability."

The Uncertain Completion Problem

Traditional software handles retries with a straightforward mental model: if the request failed, retry it. If it succeeded, don't. But agents operate in a world where the third state — uncertain — is the most common and most dangerous.

Consider what happens when an agent calls a payment API and the connection drops after 30 seconds. Three things could have happened: the request never reached the server, the server processed it but the response was lost, or the server is still processing it. The agent has no way to distinguish between these cases. Every agent framework defaults to treating timeouts as failures, which means retrying — and potentially duplicating the side effect.

This uncertain completion problem is amplified in multi-step agent workflows. If an agent is executing step 7 of a 14-step workflow and crashes, the recovery logic needs to determine which of the first seven steps had side effects, which are safe to re-execute, and which need human review. Traditional checkpoint-and-resume patterns don't capture this nuance because they track where execution stopped, not what effects have already propagated into the real world.

Read vs. Write: The Distinction Most Frameworks Ignore

The single most impactful design decision for agent reliability is one that almost no framework makes explicit: classifying tool calls as read-only or write operations.

Read-only operations — fetching user data, querying a database, checking an account balance — are naturally idempotent. You can retry them freely without consequence. Write operations — sending emails, processing payments, creating resources, updating records — are where every retry carries risk.

Yet look at how most agent frameworks define tools: a name, a description, a set of parameters, and a function to call. There's no metadata indicating whether the tool mutates state. The framework's retry logic wraps both equally, treating search_documents() and delete_account() with the same error-handling strategy.

The fix starts with making this distinction explicit in your tool definitions. Every tool should declare whether it's read-only or has side effects. Read-only tools get aggressive retry policies — retry five times with exponential backoff and jitter. Write tools get conservative treatment: retry at most once, and only after checking whether the original call succeeded.

Idempotency Keys Don't Map Cleanly to Agent Workflows

Payment APIs solved the retry problem years ago with idempotency keys. The client generates a unique identifier, sends it with the request, and the server ensures that duplicate requests with the same key return the cached result instead of re-executing the operation. Stripe, AWS, and virtually every serious payment processor supports this pattern.

But applying idempotency keys to agent workflows introduces complications that payment APIs never faced.

Granularity mismatch. In a payment API, one idempotency key maps to one logical operation: charge this card this amount. In an agent workflow, a single user intent ("book me a flight to Tokyo next Tuesday") might decompose into a dozen tool calls across multiple services. Should each tool call get its own idempotency key? Should the entire workflow share one? If each step has its own key, you can get a correct-but-partial state where steps 1 through 5 completed but step 6 failed — and a retry of the whole workflow would skip the first five steps but re-execute with potentially stale context. If the workflow shares one key, you can't retry individual steps without re-running everything.

Non-deterministic decomposition. The same user request might decompose into different tool calls on retry because the LLM is non-deterministic. "Book a flight to Tokyo" might first try airline A's API and, on retry, try airline B. The idempotency key from the first attempt doesn't protect the second attempt because it's a fundamentally different operation. Traditional idempotency assumes the same key means the same request — an assumption that breaks when an LLM is deciding what to do.

Temporal coupling. Idempotency keys have TTLs, typically 24 hours for payment APIs. But agent workflows can span much longer — a travel-booking agent might hold a reservation for days before confirming. If the idempotency key expires before the workflow completes, retries lose their protection precisely when the workflow is most likely to need them.

Design Patterns That Actually Work

Given these constraints, production agent systems need patterns that go beyond simple idempotency keys. Here are four that have proven effective.

Operation Journals

Instead of relying on idempotency keys at the tool level, maintain a journal of completed operations at the workflow level. Before executing any write operation, the agent checks the journal. After successful execution, it records the result. On retry, it replays the journal to reconstruct the workflow state without re-executing side effects.

This is the pattern behind Temporal's event history and similar durable execution engines. The key insight is that the journal records effects, not intents. It doesn't say "the agent wanted to send an email" — it says "email XYZ was sent at timestamp T with message ID M." This makes retries deterministic even when the LLM's decision-making is not.

Two-Phase Tool Calls

Split every write operation into a preview phase and a commit phase. The preview phase is read-only: it returns what would happen without executing it. The commit phase requires a token from the preview, with a short expiration window.

This pattern serves double duty. It makes destructive actions explicitly confirmable, and it creates a natural checkpoint where the agent can verify its intent before committing. If the agent crashes between preview and commit, no side effect occurred. If it crashes after commit, the operation completed and the token prevents duplicate execution.

Effect Tracking with Compensation

For multi-step workflows, maintain a list of completed effects alongside compensation actions — the inverse operations that would undo each effect. If step 5 of a 7-step workflow fails, you have two options: retry step 5 with its idempotency key, or compensate steps 1 through 4 and restart. This is the saga pattern from distributed systems, adapted for agents.

The critical addition for agents is that compensation actions must also be idempotent. If the compensation for "send email" is "send retraction email," you need to ensure the retraction is only sent once even if the compensation logic itself is retried.

Conditional Execution Guards

Before executing a write operation, query the target system to determine if the operation already completed. Don't send a payment if the payment already exists. Don't create a user if the user already exists. Don't send an email if the message ID is already in the sent folder.

This pattern is simple but surprisingly underused in agent systems. It shifts the burden of idempotency from the agent framework to the business logic, which is often the right place for it. The target system knows whether the operation happened — the agent framework is just guessing.

The Infrastructure Layer: Durable Execution

These patterns are powerful but tedious to implement from scratch for every agent. This is where durable execution engines like Temporal earn their place in the agent stack.

Temporal's core promise — that workflow code will execute to completion regardless of infrastructure failures — directly addresses the uncertain completion problem. Every tool call becomes an "activity" with configurable retry policies, timeouts, and heartbeats. The workflow engine maintains the event history that serves as the operation journal. If a worker crashes mid-execution, another worker picks up the workflow and replays the history to reconstruct state without re-executing completed activities.

The fit between durable execution and agent orchestration is natural because both deal with the same fundamental challenge: coordinating long-running, multi-step processes where individual steps can fail, timeout, or produce uncertain outcomes. The difference is that distributed systems engineers solved this problem a decade ago, and the agent community is slowly rediscovering the same solutions.

But durable execution isn't a silver bullet. The agent's LLM calls themselves introduce non-determinism that workflow engines weren't designed for. If you replay a workflow and the LLM makes a different decision than it did originally, the replay diverges from the history. Production systems handle this by recording LLM decisions as events in the workflow history, making the replay deterministic regardless of what the model would do if called again.

Building Your Idempotency Checklist

If you're shipping an agent to production, audit every tool in your toolkit with these questions:

Does this tool have side effects? If yes, it needs idempotency protection. If no, retry freely.
Can I query whether the effect already happened? If your email API returns a message ID, check for that ID before sending again. If your payment API supports idempotency keys, use them.
What's the blast radius of a duplicate? A duplicate Slack message is annoying. A duplicate wire transfer is a lawsuit. Scale your protection to the consequence.
What's the compensation action? If you can't undo it (sending an email, posting to social media), the idempotency protection must be bulletproof. If you can undo it (creating a database record), you have a fallback.
Does the tool call depend on previous steps' results? If yes, you need an operation journal, not just per-call idempotency keys.

The uncomfortable truth is that most agent frameworks optimize for the demo — where everything works on the first try and side effects don't matter — rather than for production, where the retry path is the critical path. The teams shipping reliable agents in 2026 aren't the ones with the most sophisticated prompt engineering. They're the ones who treat every tool call as a potentially duplicated message in a distributed system, because that's exactly what it is.

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

Agent Idempotency: Why Your AI Agent Sends That Email Twice

The Uncertain Completion Problem

Read vs. Write: The Distinction Most Frameworks Ignore

Idempotency Keys Don't Map Cleanly to Agent Workflows

Design Patterns That Actually Work

Operation Journals

Two-Phase Tool Calls

Effect Tracking with Compensation

Conditional Execution Guards

The Infrastructure Layer: Durable Execution

Building Your Idempotency Checklist

Recommended Reading

About Tian Pan

The Uncertain Completion Problem​

Read vs. Write: The Distinction Most Frameworks Ignore​

Idempotency Keys Don't Map Cleanly to Agent Workflows​

Design Patterns That Actually Work​

Operation Journals​

Two-Phase Tool Calls​

Effect Tracking with Compensation​

Conditional Execution Guards​

The Infrastructure Layer: Durable Execution​

Building Your Idempotency Checklist​

Recommended Reading

About Tian Pan

The Uncertain Completion Problem

Read vs. Write: The Distinction Most Frameworks Ignore

Idempotency Keys Don't Map Cleanly to Agent Workflows

Design Patterns That Actually Work

Operation Journals

Two-Phase Tool Calls

Effect Tracking with Compensation

Conditional Execution Guards

The Infrastructure Layer: Durable Execution

Building Your Idempotency Checklist