The Agent That Wouldn't Stop: Scope Creep as a Runtime Failure Mode
You asked the agent to fix a flaky test. At minute three, the test passes. At minute four, the agent is reading neighbouring files. At minute nine, it has "improved" a helper that the test never touched, renamed an unrelated parameter for clarity, and started a refactor of the fixture builder. The diff that lands is twelve files and four hundred lines. The original bug is fixed. So is some other code that wasn't broken.
This is not a model getting confused. This is a model doing exactly what its instructions left room for. The task said "fix the bug." It did not say "stop after the bug is fixed." Most agent loops have a defined start and a defined success criterion, and a very fuzzy answer to the third question: when are you done? In a chat session, "done" is whatever the user accepts. In an autonomous loop, "done" is whatever the stopping condition says, and if you didn't write one, the stopping condition is "the model lost interest." That isn't a failure mode you can debug. It's a failure mode you have to design out.
