The Helpful-But-Wrong Problem: Operational Hallucination in Production AI Agents
Your AI agent just completed a complex database migration task. It called the right tool, used proper terminology, referenced the correct library, and returned output that looks completely reasonable. Then your DBA runs it against a 50M-row production table — and the backup flag was wrong. The flag exists in a neighboring library version, it's syntactically valid, but it silently no-ops the backup step.
The agent wasn't hallucinating wildly. It was confident, fluent, and directionally correct. It was also operationally wrong in exactly the way that causes data loss.
This is the hallucination category the field underinvests in, the one that your evals are almost certainly not catching.
