3 posts tagged with "api"

Your Load Tests Are Lying: LLM Provider Capacity Contention in Production

May 7, 2026 · 11 min read

Software Engineer

You ran a load test. Your p95 latency was 450ms. You felt good about it, shipped the feature, and then your on-call rotation lit up two weeks later because users were seeing 25-second response times at 9 AM on a Tuesday.

Nothing changed in your code. No deployment, no config change. The provider's status page said "operational." And yet your app was unusable for 20 minutes during peak business hours.

This is the LLM capacity contention problem, and it's one of the most common failure modes engineers don't see coming until they've already been burned.

API Contracts for Non-Deterministic Services: Versioning When Output Shape Is Stochastic

April 17, 2026 · 9 min read

Tian Pan

Software Engineer

Your content moderation service returns {"severity": "MEDIUM", "confidence": 0.85}. The downstream billing system parses severity as an enum with values ["low", "medium", "high"]. A model update causes the service to occasionally return "Medium" with a capital M. No deployment happened. No schema changed. The integration breaks in production, and nobody catches it for six days because the HTTP status codes are all 200.

This is the foundational problem with API contracts for LLM-backed services: the surface looks like a REST API, but the behavior underneath is probabilistic. Standard contract tooling assumes determinism. When that assumption breaks, it breaks silently.

LLM API Resilience in Production: Rate Limits, Failover, and the Hidden Costs of Naive Retry Logic

March 11, 2026 · 10 min read

Tian Pan

Software Engineer

In mid-2025, a team building a multi-agent financial assistant discovered their API spend had climbed from $127/week to $47,000/week. An agent loop — Agent A asked Agent B for clarification, Agent B asked Agent A back, and so on — had been running recursively for eleven days. No circuit breaker caught it. No spend alert fired in time. The retry logic dutifully kept retrying each timeout, compounding the runaway cost at every step.

This is not a story about model quality. It is a story about distributed systems engineering — specifically, about the parts of it that most LLM application developers skip because they assume the provider handles it.

They do not.

About Tian Pan