When Your Forbidden List Becomes a Recipe: The Hidden Cost of Negative Examples in Prompts
Open a mature production system prompt and search for the word "not." On a feature that has shipped through three quarters and survived a handful of incidents, you will almost always find a section that looks like a list of regrets — "do not give medical advice, do not generate code matching these patterns, do not produce content with this regex, do not impersonate these competitors, do not use these phrases." Each line traces back to a specific incident. Each line was added with confidence by an engineer who said "this will fix it." And the list grows, every quarter, in the same way a museum acquires exhibits.
What very few teams will admit out loud is that this list — the prompt's most defensive, most carefully reviewed section — is also the most useful artifact in the entire feature for the wrong reader. A determined user who extracts the system prompt now has a curated, organized, model-readable inventory of every behavior the team is afraid of. The forbidden list is a recipe. The team wrote the cookbook.
