Skip to main content

3 posts tagged with "api"

View all tags

Your Load Tests Are Lying: LLM Provider Capacity Contention in Production

· 11 min read
Tian Pan
Software Engineer

You ran a load test. Your p95 latency was 450ms. You felt good about it, shipped the feature, and then your on-call rotation lit up two weeks later because users were seeing 25-second response times at 9 AM on a Tuesday.

Nothing changed in your code. No deployment, no config change. The provider's status page said "operational." And yet your app was unusable for 20 minutes during peak business hours.

This is the LLM capacity contention problem, and it's one of the most common failure modes engineers don't see coming until they've already been burned.

API Contracts for Non-Deterministic Services: Versioning When Output Shape Is Stochastic

· 9 min read
Tian Pan
Software Engineer

Your content moderation service returns {"severity": "MEDIUM", "confidence": 0.85}. The downstream billing system parses severity as an enum with values ["low", "medium", "high"]. A model update causes the service to occasionally return "Medium" with a capital M. No deployment happened. No schema changed. The integration breaks in production, and nobody catches it for six days because the HTTP status codes are all 200.

This is the foundational problem with API contracts for LLM-backed services: the surface looks like a REST API, but the behavior underneath is probabilistic. Standard contract tooling assumes determinism. When that assumption breaks, it breaks silently.

LLM API Resilience in Production: Rate Limits, Failover, and the Hidden Costs of Naive Retry Logic

· 10 min read
Tian Pan
Software Engineer

In mid-2025, a team building a multi-agent financial assistant discovered their API spend had climbed from $127/week to $47,000/week. An agent loop — Agent A asked Agent B for clarification, Agent B asked Agent A back, and so on — had been running recursively for eleven days. No circuit breaker caught it. No spend alert fired in time. The retry logic dutifully kept retrying each timeout, compounding the runaway cost at every step.

This is not a story about model quality. It is a story about distributed systems engineering — specifically, about the parts of it that most LLM application developers skip because they assume the provider handles it.

They do not.