The Over-Tooled Agent Problem: Why More Tools Make Your LLM Dumber
When a team at Writer instrumented their RAG-MCP benchmark, they found that baseline tool selection accuracy — with no special handling — was 13.62% when the agent had access to a large set of tools. Not 80%. Not 60%. Thirteen percent. The same agent, with retrieval-augmented tool selection exposing only the most relevant subset, reached 43%. The tools didn't change. The model didn't change. Only the number of tool definitions visible at reasoning time changed.
This is the over-tooled agent problem, and it's quietly wrecking production AI systems at scale.
