LLM Tail Latency: Why Your P99 Is a Disaster When P50 Looks Fine
Your LLM API returns a median (P50) latency of 800 milliseconds. Your dashboard is green. Your SLAs say "under two seconds." Then a user files a support ticket: "it just spins for thirty seconds and then gives up." You check the logs and see a P99 of 28 seconds.
That gap — a 35x ratio between median and tail latency — is not a fluke. It is a structural property of how LLMs work, and it will not go away by tuning your timeouts.
