The Chain-of-Thought You Stripped to Save Tokens That Hid an Evidence Requirement
A platform team shipped a prompt refactor that cut average response cost by thirty-two percent. The change was simple: strip the "explain your reasoning" preamble, ask the model to return only the JSON object, and drop the post-processing step that parsed the rationale out of the model's prose. The dashboard turned green. The unit economics page in the quarterly review went from yellow to gold. Nobody on the platform team thought to consult the risk team, because no part of the change touched the answer the customer received.
Two quarters later, a regulated customer's auditor requested the decision rationale for a denied-loan letter from a date six months prior. The team pulled the trace. The input was there. The output was there. The reasoning was gone — not because anyone deleted it, but because it had stopped being produced the day the refactor shipped. The customer's compliance program had been operating on the assumption that the rationale was somewhere in the trace store; the platform team had been operating on the assumption that the rationale was nobody's problem because the customer-facing answer was unchanged. Both assumptions were correct in isolation. Together they cost the customer a regulatory finding and the platform team a contract renewal.
