Your Load Tests Are Lying: LLM Provider Capacity Contention in Production
You ran a load test. Your p95 latency was 450ms. You felt good about it, shipped the feature, and then your on-call rotation lit up two weeks later because users were seeing 25-second response times at 9 AM on a Tuesday.
Nothing changed in your code. No deployment, no config change. The provider's status page said "operational." And yet your app was unusable for 20 minutes during peak business hours.
This is the LLM capacity contention problem, and it's one of the most common failure modes engineers don't see coming until they've already been burned.
