Skip to main content

41 posts tagged with "tool-use"

View all tags

The Action Space Problem: Why Giving Your AI Agent More Tools Makes It Worse

· 9 min read
Tian Pan
Software Engineer

There's a counterintuitive failure mode that most teams encounter when scaling AI agents: the more capable you make the agent's toolset, the worse it performs. You add tools to handle more cases. Accuracy drops. You add better tools. It gets slower and starts picking the wrong ones. You add orchestration to manage the tool selection. Now you've rebuilt complexity on top of the original complexity, and the thing barely works.

The instinct to add is wrong. The performance gains in production agents come from removing things.

The Anatomy of an Agent Harness

· 8 min read
Tian Pan
Software Engineer

There's a 100-line Python agent that scores 74–76% on SWE-bench Verified — only 4–6 percentage points behind state-of-the-art systems built by well-funded teams. The execution loop itself isn't where the complexity lives. World-class teams invest six to twelve months building the infrastructure around that loop. That infrastructure has a name: the harness.

The formula is simple: Agent = Model + Harness. The model handles reasoning. The harness handles everything else — tool execution, context management, safety enforcement, error recovery, state persistence, and human-in-the-loop workflows. If you've been spending months optimizing prompts and model selection while shipping brittle agents, you've been optimizing the wrong thing.

Building AI Agents That Actually Work in Production

· 10 min read
Tian Pan
Software Engineer

Most teams building AI agents make the same mistake: they architect for sophistication before they have evidence that sophistication is needed. A production analysis of 47 agent deployments found that 68% would have achieved equivalent or better outcomes with a well-designed single-agent system. The multi-agent tax — higher latency, compounding failure modes, operational complexity — often eats the gains before they reach users.

This isn't an argument against agents. It's an argument for building them the same way you'd build any serious production system: start with the simplest thing that works, instrument everything, and add complexity only when the simpler version demonstrably fails.

Why Your AI Agent Wastes Most of Its Context Window on Tools

· 10 min read
Tian Pan
Software Engineer

You connect your agent to 50 MCP tools. It can query databases, call APIs, read files, send emails, browse the web. On paper, it has everything it needs. In practice, half your production incidents trace back to tool use—wrong parameters, blown context budgets, cascading retry loops that cost ten times what you expected.

Here's the part most tutorials skip: every tool definition you load is a token tax paid upfront, before the agent processes a single user message. With 50+ tools connected, definitions alone can consume 70,000–130,000 tokens per request. That's not a corner case—it's the default state of any agent connected to multiple MCP servers.

Tool Calling in Production: The Loop, the Pitfalls, and What Actually Works

· 9 min read
Tian Pan
Software Engineer

The first time your agent silently retries the same broken tool call three times before giving up, you realize that "just add tools" is not a production strategy. Tool calling unlocks genuine capabilities — external data, side effects, guaranteed-shape outputs — but the agentic loop that makes it work has sharp edges that don't show up in demos.

This post is about those edges: how the loop actually runs, the formatting rules that quietly destroy parallel execution, how to write tool descriptions that make the model choose correctly, and how to handle errors in a way that lets the model recover instead of spiral.