Skip to main content

One post tagged with "llm-latency"

View all tags

Cold Cache, Hot Cache: Why Your LLM Latency Numbers Lie in Staging

· 9 min read
Tian Pan
Software Engineer

Your staging environment says p50 latency is 400ms. Your production dashboard says 1.8 seconds. You check the code — same model, same prompt, same provider. Nothing changed between deploy and release. The numbers shouldn't diverge this much, but they do.

The culprit is almost always cache state. Prompt caching — the single biggest latency optimization most teams rely on — behaves fundamentally differently under staging traffic patterns than production traffic patterns. And if you don't account for that difference, every latency number you collect before launch is fiction.