The Deterministic Seed Your Provider Treated as a Hint, Not a Contract
The CI test was a single assertion: same model, same temperature, same prompt, same seed, same output string. It passed on every developer's laptop, passed on the first hundred CI runs, and then flaked once every fifty runs for three weeks before anyone admitted the pattern was real. The first hypothesis was the obvious one — a non-deterministic dependency somewhere in the test harness — and three days of investigation found nothing. The actual cause was sitting in a footnote on the provider's API reference: "seed provides best-effort determinism." The team had read the parameter name and assumed a contract. The provider had documented a hint.
This is a specific failure mode of hosted inference that catches teams who design test infrastructure around a single mental model: the model is a pure function of its inputs, and the seed is what makes the function reproducible. Both halves of that model are wrong in production, and the gap between the API surface and the underlying physics is wide enough that teams build entire eval and regression-test stacks on top of an assumption their provider explicitly disclaimed.
