Prompt Cache Break-Even: The Exact Math on When Provider-Side Prefix Caching Actually Pays Off
Prompt caching sounds like a clear win: Anthropic and OpenAI both advertise a 90% discount on cache hits, and the documentation shows impressive cost reduction charts. Teams implement it, monitor the cache hit rate counter going up, and assume they're saving money. Some of them are paying more than if they hadn't cached anything.
The issue is the write premium. Every time you cache a prefix, you pay a surcharge — 1.25× on a 5-minute cache window, 2× for a 1-hour window. If your hit rate is too low, those write premiums accumulate faster than the read discounts recover them. Caching is not free insurance; it's a bet you place against your own traffic patterns.
