5 posts tagged with "sla"

Wall-Clock Deadline Drift: Why Your Agent Thinks It Has Time It Doesn't

May 10, 2026 · 9 min read

Software Engineer

A user clicks send. The agent is configured with a thirty-second budget. The planner inspects the task, sees a deep-research path that takes about twelve seconds and a quick lookup that takes three, and confidently picks the deep path because "we have plenty of time." Twenty-eight seconds later the response lands, two seconds past the SLA the team published last quarter. The dashboard says the agent's reasoning was correct. The retry logic was correct. The tool calls succeeded. Nobody can explain why the user's spinner sat for forty-six seconds.

The bug is not in any single component. It is in the seam between them, in a value the system never thought to refresh: the agent's belief about how much time is left. Somewhere between request acceptance and the model's next planning step, a transparent retry happened, the wall clock advanced, and the deadline metadata didn't. The model is now reasoning about a budget it cashed out fifteen seconds ago and doesn't know it.

AI Output Volatility Is a Business Risk You're Probably Underpricing

May 7, 2026 · 9 min read

Tian Pan

Software Engineer

When companies talk about AI risk, the conversation usually gravitates toward the obvious failures: hallucinated facts, biased outputs, legal liability from generated content. What gets far less attention is a quieter structural problem: you've made commercial commitments — pricing tiers, SLAs, customer-facing accuracy claims — on top of a system whose outputs are inherently probabilistic. Every time the model generates a response, it's sampling from a distribution. The contract doesn't mention distributions.

This is a business risk that most teams discover late, when a customer complains that the same document review workflow gave completely different results on Monday and Friday. Or when a regulator asks for reproducibility guarantees that the system architecturally cannot provide.

Your Provider's 99.9% SLA Is Measured at the Wrong Boundary for Your Agent

April 27, 2026 · 11 min read

Tian Pan

Software Engineer

A model provider publishes a 99.9% availability SLA. The procurement team frames it as "three nines, four hours of downtime per year, acceptable for a non-tier-zero workload." Six months later the agent feature ships and the on-call dashboard shows a user-perceived task-success rate around 98% — a number nobody wrote into a contract, nobody can find on the provider's status page, and nobody owns. The provider is meeting their SLA. The product is missing its SLO. Both are true at the same time, and the gap is not a bug — it is arithmetic.

The arithmetic is the part most teams skip. A provider's 99.9% is measured against a synchronous-request workload — one user, one prompt, one response, one billing event. An agent does not generate that workload. A single user-perceived task fans out into 8 to 20 inference calls, retries on transient errors, hedges on slow ones, and aggregates partial outputs. Each of those calls is an independent draw against the provider's failure distribution, and the task fails if any essential call fails. The boundary the SLA covers and the boundary the user feels are not the same boundary.

What 99.9% Uptime Means When Your Model Is Occasionally Wrong

April 20, 2026 · 10 min read

Tian Pan

Software Engineer

A telecom company ships an AI support chatbot with 99.99% availability and sub-200ms response times — every traditional SLA metric is green. It is also wrong on 35% of billing inquiries. No contract clause covers that. No alert fires. The customer just churns.

This is the watermelon effect for AI: systems that look healthy on the outside while quietly rotting inside. Traditional reliability SLAs — uptime, error rate, latency — were built for deterministic systems. They measure whether your service answered, not whether the answer was any good. Shipping an AI feature under a traditional SLA is like guaranteeing that every email your support team sends will be delivered, without any commitment that the replies make sense.

The Provider Reliability Trap: Your LLM Vendor's SLA Is Now Your Users' SLA

April 15, 2026 · 9 min read

Tian Pan

Software Engineer

In December 2024, Zendesk published a formal incident report stating that from June 10 through June 11, 2025, customers lost access to all Zendesk AI features for more than 33 consecutive hours. The engineering team's remediation steps were empty — there was nothing to do. The outage was caused entirely by their upstream LLM provider going down, and Zendesk had no architectural path to restore service without it.

This is the provider reliability trap in its clearest form: you ship a feature, make it part of your users' workflows, promise availability through implicit or explicit SLA commitments, and then discover that your entire reliability posture is bounded by a dependency you don't control, can't fix, and may not have formally evaluated before launch.

About Tian Pan