Skip to main content

The AI Incident Runbook: When Your Agent Causes Real-World Harm

· 11 min read
Tian Pan
Software Engineer

Your agent just did something it shouldn't have. Maybe it sent emails to the wrong people. Maybe it executed a database write that should have been a read. Maybe it gave medical advice that sent a user to the hospital. You are now in an AI incident — and the playbook you've been using for software outages will not help you.

Traditional incident runbooks are built on a foundational assumption: given the same input, the system produces the same output. That assumption lets you reproduce the failure, bisect toward the cause, and verify the fix. None of that applies to a stochastic system operating on natural language. The same prompt through the same pipeline can produce different results across runs, providers, regions, and time. Documented AI incidents surged 56% from 2023 to 2024, yet most organizations still route these events through software incident processes designed for a fundamentally different class of problem.

This is the runbook they should have written.

Why Your Existing Runbook Will Mislead You

A standard software runbook works by asking: what changed? Find the deployment, the config change, the dependency update. Roll it back. Verify.

AI incidents resist this framing for several reasons.

First, the system may not have changed at all. Your retrieval index drifted because upstream data was updated. Your model provider silently updated the model behind a stable API version. Your context window grew past the threshold where your system prompt gets truncated — not because of a code change, but because a conversation grew longer than usual. The "what changed" question often has no answer you can point to.

Second, the failure may not be reproducible. LLM outputs are sampled from probability distributions. The harmful completion you're investigating may be a low-probability event that occurred once and may never occur again in testing. Running the same prompt returns normal output. Your test suite passes. This is not exculpatory — it means your evaluation methodology needs to change, not that the system is safe.

Third, the blast radius is harder to bound. In a deterministic system, you can enumerate every execution path that touched the bug. In an AI system, every interaction with a misbehaving agent is a unique event. You don't know which users received bad outputs unless you logged every completion — and many teams don't.

Step One: Stop the Bleeding Before You Understand the Cause

In a traditional incident, you might tolerate investigation time before acting because you can bound the ongoing damage. In an AI incident where the system is actively taking actions — sending messages, writing records, making API calls — every minute of continued operation potentially expands the harm.

The first decision is binary: does this system need to come down right now, or can you narrow the blast radius without a full shutdown?

To make that call without a stack trace, reason from signals you do have:

Access scope: What data, systems, and users can this agent touch? A customer-facing chatbot that only reads from a FAQ database has a narrow scope. An agent with write access to production records and outbound communication has a wide one.

Operating velocity: How many operations per minute is the system executing? An agent processing 10 requests per day in a low-stakes workflow gives you time to investigate. An agent handling 10,000 requests per hour cannot wait.

Detection window: How long might this have been happening before you noticed? If monitoring caught this within minutes, the damage is probably contained. If an anomaly that started last week only surfaced today, assume the worst and audit backward.

Blast radius estimation in this context is a product: scope × velocity × detection window. A clinical documentation agent accessing two million patient records and processing a thousand records per day, if undetected for thirty days, has a theoretical blast radius of thirty thousand affected interactions. That calculation tells you whether to take the system down immediately or proceed cautiously.

If in doubt, take it down. The reputational and legal cost of continued harm exceeds the cost of an unnecessary outage by a wide margin in most domains.

Step Two: Preserve Evidence Before It Disappears

The biggest mistake teams make in AI incidents is letting evidence age out before they've captured it. Logs get rotated. Model provider trace data expires. Context windows that seemed reproducible stop being reproducible once the underlying model updates.

Within the first fifteen minutes of declaring an incident, freeze the following:

Full prompt and completion logs with timestamps, not just inputs and outputs. The system prompt matters. The conversation history matters. Anything that was in context when the harmful output was generated is potential evidence.

Model version metadata: What model were you calling, at what temperature, with what sampling parameters? If you were calling a provider API with a mutable model alias (like "gpt-4" instead of a specific version string), you may not be able to recover what model version was actually serving requests. This is a critical gap — prefer pinned version strings in production systems.

Tool invocation traces: If the agent called external tools, preserve every call and return value. Which tool was called, in what order, with what arguments, and what was returned? This is often where you find the actual failure — not in the model's reasoning, but in what it was given to reason about.

Identity and delegation chains: Who or what authorized this action? If your agent operates on behalf of users, which user triggered the chain of events? This matters for both technical remediation and legal disclosure.

Retrieval context: If you use RAG, what documents were retrieved and ranked? A harmful output often traces back to what the model was given, not what it invented. Preserve the retrieval inputs, query, and ranked results.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates