SLOs for Non-Deterministic Systems: Defining Reliability When Every Response Is Different
Your AI feature returns HTTP 200, completes in 180ms, and produces valid JSON. By every traditional SLI, the request succeeded. But the answer is wrong — a hallucinated product spec, a fabricated legal citation, a subtly incorrect calculation. Your monitoring is green. Your users are furious.
This is the fundamental disconnect that breaks SRE for AI systems. Traditional reliability engineering assumes a successful execution produces a correct result. Non-deterministic systems violate that assumption on every request. The same prompt, same context, same model version can produce a different — and differently wrong — answer each time.
