The Policy File: Why Your Refusal Rules Don't Belong in Your System Prompt
A safety reviewer at a fintech startup pushed a four-line addition to the system prompt last quarter. The change: a refusal rule preventing the assistant from giving specific tax advice for a jurisdiction the company didn't have a license to operate in. Reasonable, narrow, audit-clean. The rule landed on Tuesday. By Friday the eval suite was showing a 7-point drop on a customer-onboarding flow that had nothing to do with tax — the model had started hedging on every question that mentioned a country, including "what currency does this account hold." The product team backed out the change. The safety team re-shipped it the following week with slightly different wording. Three weeks later, the same regression appeared in a different shape, and the next safety edit broke a different unrelated flow.
The bug here isn't the wording. The bug is that the refusal rule is in the wrong place. It's wedged inside a 2,400-token artifact that also contains the assistant's conversational voice, its formatting contract, its task instructions, and a half-dozen other policy clauses — and every edit to any of those concerns is a behavioral edit to all of them, because the model can't tell which sentence is policy and which is style. Production system prompts grow into a tangled monolith because three orthogonal concerns are pretending to be one. The teams who haven't factored them out are paying the integration tax on every edit.
