The HITL Rubber Stamp Problem: Why Human-in-the-Loop Often Means Neither
There's a paradox sitting at the center of responsible AI deployment: the more you try to involve humans in reviewing AI decisions, the less meaningful that review becomes.
A 2024 Harvard Business School study gave 228 evaluators AI recommendations with clear explanations of the AI's reasoning. Human reviewers were 19 percentage points more likely to align with AI recommendations than the control group. When the AI also provided narrative rationales — when it explained why it made a decision — deference increased by another 5 points. Better explainability produced worse oversight. The human in the loop had become a rubber stamp on a form.
