The Expensive-to-Undo Tool Taxonomy: One Approval Gate Per Risk Class
The "send email" tool and the "delete account" tool are sitting behind the same modal. Your user has clicked "Approve" forty times today, none of those clicks involved reading the diff, and the next click — the one that ships an irreversible mutation to a production database — will look identical to the forty before it. This is the failure mode of binary tool approval, and it is the default in almost every agent framework shipped today.
The framing problem is that "needs human approval" is treated as a single boolean attached to a tool, when it is actually a five-or-six-class taxonomy that depends on what kind of damage the tool can do and how recoverable the damage is. Teams that ship safe agents stop asking "does this tool need a confirm dialog" and start asking "what risk class does this tool belong to, and what gate corresponds to that class." The right number of approval gates is not one and not many. It is one per risk class, and you have to enumerate the classes before you can build the gates.
