Deep Research Agents: Why Most Implementations Loop Forever or Stop Too Early
Standard LLMs without iterative retrieval score below 10% on multi-step web research benchmarks. Deep research agents — systems that search, read, synthesize, and re-query in a loop — score above 50%. That five-fold improvement explains why every serious AI product team is building one. What it doesn't explain is why most of those implementations either run up a $15 bill chasing irrelevant tangents or declare victory after two shallow searches.
The core problem isn't building the loop. It's knowing when the loop should stop. And that turns out to be a surprisingly deep systems design challenge that touches convergence detection, cost economics, source reliability, and multi-agent coordination.
