Skip to main content

4 posts tagged with "idempotency"

View all tags

When Tools Lie: The False-Success Failure Mode Your Agent Trusts By Default

· 10 min read
Tian Pan
Software Engineer

The agent confidently tells the user, "I've sent the confirmation email and credited the refund to your account." The trace is clean: two tool calls, both returned {"success": true}, the model produced a polished summary, the conversation closed in 3.2 seconds. A week later the customer escalates because the email never arrived and the refund never posted. The audit trail is a sea of green checkmarks. Nothing failed — except the actual job.

This is the failure mode that has no name in most agent stacks: tools that lie. Not lie in the malicious sense — they return the response their contract specifies. The lie is structural. The HTTP layer says "200 OK" because the request was accepted, not because the operation completed. The mail provider says success: true because the message entered the outbound queue, not because it left the building. The database write returned without error because it landed on a replica that never propagated. The model, trained to be helpful and trained on examples where green means done, weaves these signals into a confident summary and moves on.

Hidden SDK Retries: Why You're Paying Twice and Don't Know It

· 10 min read
Tian Pan
Software Engineer

Open the OpenAI Python SDK source and you will find a quiet line: DEFAULT_MAX_RETRIES = 2. The Anthropic SDK ships the same default. Most TypeScript SDKs match. Two retries, exponential backoff, automatic on connection errors, 408, 409, 429, and any 5xx — fired before your code ever sees the failure. You do not configure this. You do not opt in. You usually do not know it is happening, because the metric your app records is request_count, not attempt_count, and the only span your tracer ever sees is the outer one the SDK closes after the final attempt.

This is fine, mostly, until it is not. Add an application-level retry decorator on top of that SDK call — the kind every team writes after their first 429 — and you have built a 3×3 storm: the SDK tries three times, your wrapper tries three times around the SDK, and a single user request fans out to nine inference calls during a provider degradation. The provider's bill counts every attempt. Your dashboards count one. The reconciliation, when someone finally runs it, is a quarter-end conversation nobody enjoys.

Agent Idempotency Is an Orchestration Contract, Not a Tool Property

· 10 min read
Tian Pan
Software Engineer

The support ticket arrives at 9:41 a.m.: "I was charged three times." The trace looks clean. One user message, one planner turn, three calls to charge_card — each with a distinct tool-use ID, each returning 200 OK, each writing a different Stripe charge. The tool has an idempotency key. The backend has a dedup table. The payment processor honors Idempotency-Key. Every layer is idempotent. The customer still paid three times.

This is the shape of the bug that will land on your desk if you build agents long enough. It is not a bug in any tool. It is a bug in the contract between the agent loop and the tools, and that contract almost always lives only in a senior engineer's head.

Agent Idempotency: Why Your AI Agent Sends That Email Twice

· 9 min read
Tian Pan
Software Engineer

Your agent processed a refund, but the response timed out. The framework retried. The customer got refunded twice. Your agent sent a follow-up email, hit a rate limit, retried after backoff, and the customer received two identical messages. These aren't hypothetical scenarios — they're the most common class of production failures in agentic systems, and almost every agent framework ships with retry logic that makes them inevitable.

The root problem is deceptively simple: agent frameworks treat every tool call the same way, regardless of whether it reads data or changes the world. A get_user_profile() call is safe to retry a hundred times. A send_payment() call is not. Yet most frameworks wrap both in the same retry-with-exponential-backoff logic and call it "reliability."