Skip to main content

3 posts tagged with "tooling"

View all tags

Silent Tool Truncation: The Default Cap Your Agent Reasons Over Without Knowing

· 11 min read
Tian Pan
Software Engineer

A tool call returns a 142 KB JSON blob. Your agent framework drops everything past byte 8,192, hands the prefix to the model, and the model writes a confident answer based on a fragment it never knew was a fragment. Three weeks later a customer escalates. You scroll the trace, see "tool returned successfully," and the post-mortem turns into a hunt for which step "ignored" the evidence — except no step ignored it. The evidence was clipped before it ever reached the reasoner.

This isn't a hypothetical. Codex hardcodes tool output truncation at 10 KiB or 256 lines. Claude Code defaults to 25,000 tokens for tool results, with a separate display-layer cap that briefly clipped MCP responses at around 700 characters in 2025. OpenAI's tool-output submission caps at 512 KB. Each framework picked a number that seemed safe, and for short tool calls it is. The failure mode arrives when a single step's output crosses the line — quietly, without an exception, without a flag the model can see.

Tool Reentrancy Is the Bug Class Your Function-Calling Layer Doesn't Know Exists

· 11 min read
Tian Pan
Software Engineer

The agent took four hundred milliseconds to answer a simple question, then crashed with a recursion-limit error. The trace showed twenty-five tool calls. Reading the trace top-to-bottom, an engineer would conclude the agent was confused — calling the same handful of tools in slightly different orders, never converging. That conclusion is wrong. The agent wasn't confused. It was stuck in a cycle: tool A invoked the model, the model picked tool B, tool B's implementation invoked the model again to format its output, and the formatter chose tool A. The trace UI rendered four nested calls as four sibling calls in a flat list, and the cycle was invisible to the only human who could have caught it.

This is tool reentrancy, and it's a bug class your function-calling layer almost certainly doesn't model. Concurrency-safe code has decades of primitives for it: reentrant mutexes that count nested acquisitions by the same thread, recursion limits at the language level, stack inspection APIs, and a cultural understanding that any function which calls back into the runtime needs a clear contract about what re-entry is allowed. Tool-calling layers default to fire-and-forget. There is no call stack the runtime can inspect, no cycle detector before dispatch, no reentrancy attribute on the tool definition, and the trace UI is shaped like a log, not a graph. The result is that every tool catalog past about a dozen entries silently becomes a recursion the framework can't see.

The Dual-Writer Race: When Your Agent and Your User Edit the Same Calendar Event

· 12 min read
Tian Pan
Software Engineer

The agent confidently reports "I rescheduled the meeting to Thursday at 3pm." The user is staring at the original Tuesday 10am slot, because between the agent's plan and its commit they edited the event themselves. Last-write-wins overwrote a human change with an automated one, and the user's trust in the assistant collapses on a single incident. This is the dual-writer race, and it is the bug class that agent toolchains were never designed for.

Most agent platforms inherit this category by accident. The tool layer treats update_event as a single function call: take an ID, take new fields, return success. The provider API underneath has offered optimistic concurrency primitives — ETags, version tokens, If-Match preconditions — for a decade, and almost nobody plumbs them through. The model has no way to know that the world it reasoned about a minute ago is no longer current, because the abstraction it was given silently throws that information away.