Halted Is Not a Status: Why Agents Need a Typed Terminal-Reason Protocol
Open the dashboard for an agent fleet and you will see a clean number: completion rate, 94%. Below it, a list of runs, each tagged with one of two states — running, or not running. The 6% that are "not running" all look identical. Some of them finished the task perfectly. Some of them hit a step limit two actions short of done. Some of them caught a tool error and gave up. Some of them decided the task was impossible — correctly. And some of them simply lost the thread and stopped emitting tokens.
Your monitoring cannot tell these apart. It knows the process is no longer running. It does not know why, and "why" is the only thing that matters when you are deciding whether to page someone.
