The Undo Button Your Agent Assumes Exists
Watch an agent reason through a multi-step task and you will notice something familiar: it plans the way you debug. Try an approach, look at the result, and if it is wrong, back out and try another. The agent talks about its plan as a tree of options it can explore, prune, and revisit. That mental model is correct inside a code sandbox, where every action has an implicit undo. It is dangerously wrong the moment the agent touches the world.
A sent email does not unsend. A charged card does not uncharge without a refund flow, a fee, and a customer who already saw the notification. A deleted row is gone unless someone wired up soft deletes. A posted Slack message has already been read. The agent's planning model has no native concept of the one-way door — the action that, once taken, removes the option of pretending it never happened.
This is not a model intelligence problem. A smarter model still does not know which of your tools is reversible, because reversibility is not a property of the action. It is a property of the system the action lands in. You have to tell it.
