Skip to main content

2 posts tagged with "reproducibility"

View all tags

The Incident Ticket With No Repro Steps: Reproducibility as Something You Engineer

· 10 min read
Tian Pan
Software Engineer

The incident ticket is specific in the way only real incidents are. At 02:14 the support agent closed a customer account that should have been put on a 30-day grace period. The customer noticed. The ticket lands on your desk with a single line under "Steps to reproduce": unknown.

You open the trace. You can see the agent called close_account instead of set_grace_period. You can see the tool succeeded. What you cannot see is why the model chose that branch — and when you replay the same customer message through the same agent, it does the right thing. Twice. The postmortem now has a paragraph-shaped hole where the root cause should be, and the only honest thing you can write is "could not reproduce."

Deterministic Replay: How to Debug AI Agents That Never Run the Same Way Twice

· 11 min read
Tian Pan
Software Engineer

Your agent failed in production last Tuesday. A customer reported a wrong answer. You pull up the logs, see the final output, maybe a few intermediate print statements — and then you're stuck. You can't re-run the agent and get the same failure because the model won't produce the same tokens, the API your tool called now returns different data, and the timestamp embedded in the prompt has moved forward. The bug is gone, and you're left staring at circumstantial evidence.

This is the fundamental debugging problem for AI agents: traditional software is deterministic, so you can reproduce bugs by recreating inputs. Agent systems are not. Every run is a unique snowflake of model sampling, live API responses, and time-dependent state. Without specialized tooling, post-mortem debugging becomes forensic guesswork.

Deterministic replay solves this by recording every source of non-determinism during execution and substituting those recordings during replay — turning your unreproducible agent run into something you can step through like a debugger.