The Redaction Layer Your Agent Cannot Reason Through
A privacy review approves your redaction layer. Names, emails, account numbers, phone numbers — all scrubbed before the prompt reaches the model. Your single-turn classifier still hits 94% accuracy. Six weeks later your multi-step agent starts giving confidently wrong answers to questions like "is the email Sarah used to log in the same as the one on her billing record?" and nobody can reproduce it in dev.
The redaction layer did exactly what infosec asked it to do. It also quietly destroyed the property your agent's reasoning depended on: that two mentions of the same entity in different turns refer to the same thing. The agent isn't hallucinating. It's reading a transcript where Sarah has become three different people and the "same" email address has become two distinct placeholders.
