The Latency Budget Your Agent Loop Stole from the Search Box
The launch metrics looked clean. Answer quality up, citation rate up, the eval suite green. The team that replaced the old keyword search with an agent-backed retriever shipped, took the win, and moved on. Six weeks later somebody noticed the weekly active number on that surface had drifted down twelve percent and nobody could find the regression. There was no regression. The agent worked. The users left because the box that used to answer in two hundred milliseconds now took four seconds, and nothing in the launch retro had a budget for that.
This is the latency-budget transfer problem, and almost nobody draws the org chart that catches it. A search box is not just a function call. It is a thirty-year contract with the user's nervous system: type, see results, scan, click. The 200-millisecond response is not a performance metric on a dashboard somewhere — it is the reason the user's attention is still on the screen when the results arrive. When the team underneath the box replaces a keyword index with an agent loop, the function-call surface looks identical and the SLA on the new call lives in a completely different regime. The latency budget moved from the team that owned the index to the team that owns the agent, and from the team that owns the agent to the user, and the only one who showed up to the meeting was the user.
