The Retry That Changed the Answer: Idempotency Keys for Nondeterministic LLM Calls
Every distributed system you have ever built leans on one quiet assumption: a retry after a timeout is safe. The operation is idempotent, so if the client gives up waiting and re-sends, the worst case is duplicate work that converges to the same state. Two PUTs land the same row. Two DELETEs leave the same absence. The retry is a no-op dressed as a second attempt.
LLM calls break this assumption, and they break it silently. A retry does not re-fetch the same answer — it samples a new one. When a client times out at the network layer because the response was lost in transit, but the provider actually finished the generation, the retry produces a second, different answer. Now two distinct outputs exist for one logical request, and nothing in your stack knows which one is canonical.
This is not a rare edge. Practitioners running models behind timeouts report that 5–10% of requests hit the full timeout-plus-retry cycle even when the underlying call eventually succeeds. Every one of those is a coin flip your system was never designed to adjudicate.
