The Streaming Response That Returns 200 Then Fails: How Mid-Stream Errors Break Your SLOs
Your availability dashboard says 99.95%. Your users say the answer stopped mid-sentence. Both are correct, and that is the problem.
The HTTP-era reliability stack was built on a single assumption: the status code arrives at the end of a request and summarizes its fate. A 200 means success. A 5xx means retry. The load balancer counts the ratio, the SLO dashboard aggregates it, the alerting fires on the burn rate. Every layer of that stack reads the header and trusts it.
Streaming inverts the assumption. The moment your server flushes the first token, it has already committed to a 200. Everything that goes wrong after that — a provider timeout at token 400, a content filter trip mid-paragraph, a dropped TCP connection, a malformed tool-call fragment — happens after the verdict has been rendered and cannot be retracted. The request failed. The status code says it succeeded. And nothing in your reliability tooling is built to notice the difference.
