One post tagged with "llm-latency"

Cold Cache, Hot Cache: Why Your LLM Latency Numbers Lie in Staging

April 10, 2026 · 9 min read

Software Engineer

Your staging environment says p50 latency is 400ms. Your production dashboard says 1.8 seconds. You check the code — same model, same prompt, same provider. Nothing changed between deploy and release. The numbers shouldn't diverge this much, but they do.

The culprit is almost always cache state. Prompt caching — the single biggest latency optimization most teams rely on — behaves fundamentally differently under staging traffic patterns than production traffic patterns. And if you don't account for that difference, every latency number you collect before launch is fiction.

About Tian Pan