The On-Call Rotation Your Agent Platform Forgot to Staff
The AI platform team has four engineers. The internal agent they shipped seven months ago is now answering questions for 200 employees a day. For the first month the founding engineer answered every Slack ping personally — Tuesday at 11pm, Sunday morning, the night of the company offsite. Then she got promoted to staff engineer for the impact she had on adoption, and three weeks later she stopped checking the channel after 6pm because that is what staff engineers do. The on-call rotation that was supposed to replace her was never formalized, because the operating model was always going to be figured out "after the pilot."
The day the agent silently degrades for a quarter of users — a retrieval index that quietly fell behind, or a model version flip that shifted refusal behavior, or a tool whose schema rotated and is now returning empty arrays — the complaints do not land on the platform team's pager. They land in the help desk queue, staffed by people who do not have access to the agent's traces, do not know what a system prompt is, and have been told by IT that the agent is "owned by the AI team." Sixteen hours pass between the first user complaint and the first engineer who looks at a trace. Nobody on the platform team is asleep at the wheel; there is no wheel.
