The Prompt Cache Your Personalization Layer Quietly Killed
The product team ships personalization. The agent now greets the user by name, tunes its response length to their stated preference, knows the user works in healthcare, and respects the user's timezone for any date it mentions. The satisfaction lift is real and measurable — the A/B is a four-point win on thumbs-up rate and the rollout goes to one hundred percent. Three weeks later, finance flags that inference spend has roughly tripled, and nobody on the AI team can immediately explain why.
The explanation is one line of code change buried in the system-prompt builder. Per-user context — name, preferred response length, industry, timezone — got prepended to the system prompt so the model would see it on every turn. That made every user's prompt unique from the first token. Your provider's prompt cache, which had been serving roughly ninety percent of your input tokens at one-tenth the standard price, stopped hitting. Latency barely moved, so the perf dashboard stayed green. The billing dashboard caught up at month-end.
