Skip to main content

2 posts tagged with "forensics"

View all tags

Agent Incident Forensics: Capture Before You Need It

· 11 min read
Tian Pan
Software Engineer

The customer sends a screenshot to support on a Tuesday. Their account shows a refund posted six days ago that they never asked for. Your CRO forwards the screenshot with one question: "What produced this?" You know an agent did it — the audit log says actor: refund-agent-v3. But the prompt has been edited four times since. The model id rotated last Thursday when finance switched providers to chase a 12% cost cut. The system prompt is templated from three retrieved documents, and the retrieval index was reindexed Monday. The conversation history was trimmed by the runtime to fit a smaller context window.

You can tell the CRO the agent did it. You cannot tell them why. That gap — between knowing an action happened and being able to reconstruct the inputs that caused it — is the gap most agent teams discover the first time someone outside engineering asks a real forensic question.

The Agent Flight Recorder: Capture These Fields Before Your First Incident

· 12 min read
Tian Pan
Software Engineer

The first time an agent goes sideways in production — it deletes the wrong row, emails the wrong customer, burns $400 of inference on a single task, or tells a regulated user something legally exposed — the team opens the logs and discovers what they actually have: a CloudWatch stream of tool-call names with truncated arguments, a "user prompt" field that captured only the latest turn, and no record of which model version actually ran. The provider rolled the alias forward two weeks ago. The system prompt lives in a config service that wasn't snapshotted. Temperature wasn't logged because the framework default was 0.7 and "everyone knows that." The tool result that triggered the bad action exceeded the log line size and got truncated to "...".

You cannot reconstruct the decision. You can only guess. Six months later you have a pile of "why did it do that" reports with no answers, and the team starts treating the agent like weather — something that happens to you, not something you debug.

The flight recorder discipline is the cheapest thing you will ever ship that prevents this, and the most expensive thing you will ever ship if you wait until the first incident to start. The fields below are the bare minimum, the storage shape is non-negotiable, and the sampling and privacy boundaries have to be designed alongside — not retrofitted.