The Provider Reliability Trap: Your LLM Vendor's SLA Is Now Your Users' SLA

April 15, 2026 · 9 min read

Software Engineer

In December 2024, Zendesk published a formal incident report stating that from June 10 through June 11, 2025, customers lost access to all Zendesk AI features for more than 33 consecutive hours. The engineering team's remediation steps were empty — there was nothing to do. The outage was caused entirely by their upstream LLM provider going down, and Zendesk had no architectural path to restore service without it.

This is the provider reliability trap in its clearest form: you ship a feature, make it part of your users' workflows, promise availability through implicit or explicit SLA commitments, and then discover that your entire reliability posture is bounded by a dependency you don't control, can't fix, and may not have formally evaluated before launch.

The trap is structural. You can't engineer your way out of it at incident time. The decisions that matter happen months before any specific outage, during architecture and product design. Most teams make those decisions badly — not because they're careless, but because the math is invisible until it bites.

The SLA Arithmetic Nobody Does Before Launch

The fundamental property of serial dependencies is that their availabilities multiply. If your own infrastructure runs at 99.9% and your LLM provider delivers 99.5%, your theoretical maximum availability is 0.999 × 0.995 = 99.4%. You cannot offer 99.9% to your customers if a serial dependency in your critical path only guarantees 99.5%.

This math gets worse as you add dependencies:

3 services each at 99.9%: 0.999³ ≈ 99.7%
5 services each at 99.9%: 0.999⁵ ≈ 99.5%
Your infra (99.9%) + LLM at 99.5% + vector DB at 99.9%: ≈ 99.3%

What do those numbers feel like in practice? A 99.5% SLA allows 43.8 hours of downtime per year — roughly 3.6 hours per month. A 99.9% SLA allows 8.76 hours per year. If you're offering 99.9% to enterprise customers and your LLM provider is capped at 99.5%, you have a 5x gap before a single line of application code fails.

The problem is compounded by the current state of provider SLAs. Anthropic's standard tier has no published SLA — it's best-effort availability. OpenAI's direct API for standard customers also offers no contractual guarantee. Google's Vertex AI publishes a 99.5% target. Azure OpenAI offers 99.9%, which is why enterprises route through it specifically. Even the best publicly available LLM SLA maxes out at 99.9%, and most standard-tier users have nothing in writing.

Why Incidents Cluster at the Worst Possible Time

A common mental model treats provider outages as independent events uniformly distributed across time. This model is wrong, and acting on it underestimates your real risk.

LLM inference is GPU-constrained. Unlike CPU-based services that can spin up thousands of additional instances in minutes, GPU provisioning has lead times measured in months. When a new model launches or a viral use case drives sudden demand, providers can't elastically scale to absorb it. The result is that capacity pressure and infrastructure stress co-occur with the usage spikes you care most about.

The OpenAI December 2024 outage was triggered by a new telemetry service overwhelmed Kubernetes API servers — infrastructure changes that accompanied rapid platform evolution. The March 2025 image generation launch coincided with infrastructure strain and subsequent outages. Multiple documented incidents show this pattern: new capability launches drive usage surges, infrastructure changes are deployed to support growth, and outages follow.

Academic analysis of LLM service telemetry from 2023–2024 found that ChatGPT was fully accessible on only 88.85% of days during that period. The key implication: if you're running a B2B product where customers use your AI features during business hours, and your provider's outages correlate with high-traffic periods, your user-visible reliability is lower than the headline availability numbers suggest.

The Product Decisions That Must Happen Before an Outage

The most consequential reliability work isn't incident response — it's the categorization exercise you should do before launch. Every AI feature in your product falls into one of three categories:

Features that can serve from cache. If the LLM output is deterministic or slowly varying given similar inputs — product descriptions, FAQ responses, content summaries, recommendation explanations — you can cache responses and serve them during provider outages. One documented production case improved effective uptime from 99.2% to 99.87% by adding circuit breakers with cached response fallback. This category often represents more of your feature surface than you expect.

Features that can degrade to a lighter model or rule-based fallback. Simple intent classification, basic routing, keyword matching — for many tasks, a cheaper or locally-hosted model provides enough functionality to maintain the feature at reduced capability. A customer support bot that can't access Claude might still route tickets to the right queue using a rule-based classifier. Users experience degraded quality, not outage.

Features that hard-fail. Novel document analysis, multi-step agentic workflows, real-time code generation — these have no reasonable pre-computable fallback. When your provider goes down, these features go down. The critical question isn't how to prevent this but whether you've been honest with users about the dependency and designed the failure message accordingly.

Skipping this categorization means every feature hard-fails by default. Doing it upfront means you can preserve a meaningful fraction of your product's value during incidents.

Loading…

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

The Provider Reliability Trap: Your LLM Vendor's SLA Is Now Your Users' SLA

The SLA Arithmetic Nobody Does Before Launch

Why Incidents Cluster at the Worst Possible Time

The Product Decisions That Must Happen Before an Outage

Recommended Reading

About Tian Pan

The SLA Arithmetic Nobody Does Before Launch​

Why Incidents Cluster at the Worst Possible Time​

The Product Decisions That Must Happen Before an Outage​

Recommended Reading

About Tian Pan

The SLA Arithmetic Nobody Does Before Launch

Why Incidents Cluster at the Worst Possible Time

The Product Decisions That Must Happen Before an Outage