Stale Few-Shot Examples and the Half-Life Your Prompt Repo Ignores
Open the system prompt of any AI feature that has been in production for more than nine months. Scroll past the role description, past the formatting rules, past the safety guardrails. Stop at the block titled <examples> or ## Examples or whatever your team called it the day someone copied the first three good Slack threads into a code block. Read them. There is a 60% chance at least one of them references a feature that has been renamed, a button that no longer exists, or a workflow the product manager quietly killed two quarters ago.
The decay is not visible from the eval dashboard. The eval scores are green. They have been green for months. They are green because the eval set was authored against the same product surface the few-shots reference, and the two have aged together in lockstep. The model is performing a flawless impression of last year's product, on a test set that grades it for being faithful to last year's product, while real users interact with this year's product and quietly tolerate the resulting confabulations. This is the half-life nobody puts in the LLMOps roadmap.
