Skip to main content

5 posts tagged with "function-calling"

View all tags

Your Tool Descriptions Are Prompts, Not API Docs

· 10 min read
Tian Pan
Software Engineer

The tool description is not documentation. It is the prompt the model reads, every single turn, to decide whether this tool fires and how. You are not writing for the developer integrating against the tool — the developer already has the schema, the types, the examples in the PR. You are writing for a stochastic reader that has never seen this codebase, is holding twenty other tool descriptions in the same context window, and has to pick one in the next forward pass.

Most teams don't. They paste the OpenAPI summary into the description field, stick the JSON Schema under it, and ship. Then the agent undercalls the tool, confidently calls the wrong adjacent tool, or fires the right tool with parameters that were "obviously" wrong to any human reading the schema. The team blames the model. The model was reading exactly what you wrote.

Tool Manifest Lies: When Your Agent Trusts a Schema Your Backend No Longer Honors

· 10 min read
Tian Pan
Software Engineer

The most dangerous bug in a production agent isn't the one that throws. It's the one where a tool description says returns user_id and the backend quietly started returning account_id two sprints ago, and the model is still happily inventing user_id in downstream reasoning — because the manifest said so, and the few-shot history reinforced it, and nothing in the loop ever fetched ground truth.

This is manifest drift: the slow, silent divergence between what your tool descriptions claim and what your endpoints actually do. It rarely produces stack traces. It produces bad decisions with clean audit trails — the worst class of bug in agent systems.

Tool Docstring Archaeology: The Description Field Is Your Highest-Leverage Prompt

· 11 min read
Tian Pan
Software Engineer

The highest-leverage prompt in your agent is not in your system prompt. It is the one-sentence description you wrote under a tool definition six months ago, committed alongside the implementation, and never touched again. The model reads it on every turn to decide whether to invoke the tool, which arguments to bind, and how to recover when the response doesn't match expectations. Engineers treat it as API documentation for humans. The model treats it as a prompt.

The gap between those two framings is where the worst kind of tool-use bugs live: the model invokes the right function name with the right arguments, and the right API call goes out — but for the wrong reasons, in the wrong situation, or in preference over a better tool sitting next to it. No exception fires. Your eval suite still passes. The regression only shows up as a slow degradation in whatever metric you use to measure whether the agent is actually helping.

Writing Tools for Agents: The ACI Is as Important as the API

· 9 min read
Tian Pan
Software Engineer

Most engineers approach agent tools the same way they approach writing a REST endpoint or a library function: expose the capability cleanly, document the parameters, handle errors. That's the right instinct for humans. For AI agents, it's exactly wrong.

A tool used by an agent is consumed non-deterministically, parsed token by token, and selected by a model that has no persistent memory of which tool it used last Tuesday. The tool schema you write is not documentation — it is a runtime prompt, injected into the model's context at inference time, shaping every decision the agent makes. Every field name, every description, every return value shape is a design decision with measurable performance consequences. This is the agent-computer interface (ACI), and it deserves the same engineering investment you'd put into any critical user-facing interface.

Tool Use in Production: Function Calling Patterns That Actually Work

· 9 min read
Tian Pan
Software Engineer

The most surprising thing about LLM function calling failures in production is where they come from. Not hallucinated reasoning. Not the model picking the wrong tool. The number one cause of agent flakiness is argument construction: wrong types, missing required fields, malformed JSON, hallucinated extra fields. The model is fine. Your schema is the problem.

This is good news, because schemas are cheap to fix.