Skip to main content

5 posts tagged with "agent-engineering"

View all tags

The Tool Description That Drifted Out of Sync With the Tool It Described

· 12 min read
Tian Pan
Software Engineer

A backend engineer renames a parameter from user_id to account_id because the two stopped being the same thing six months ago, and a support ticket finally made the ambiguity intolerable. The JSON schema for the tool gets updated in the pull request that ships the rename. The tool's prose description — the one paragraph the model actually reads to decide whether to call the tool and how — lives in a different repository, owned by a different team, updated through a ticket queue, and still reads "pass the user_id to look up the account." Nobody flags it. The model dutifully calls the tool with the right schema, fills the right field, and gets the right answer on every single happy-path query. The bug is invisible until the day a user types something where their authenticated user_id and the account_id they were asking about are two different entities, and the agent confidently returns somebody else's data.

The Dependency Bomb in Your Tool Catalog: When Adding One Tool Breaks Five Agents

· 8 min read
Tian Pan
Software Engineer

A team I know shipped a new lookup_customer_v2 tool to their support agent's catalog on a Tuesday. The tool was scoped narrowly, well-tested in isolation, and approved by review. By Thursday, an unrelated workflow — refund processing — was failing on roughly four percent of cases that used to succeed. The refund tool hadn't changed. The refund prompt hadn't changed. The model hadn't changed. What changed was that the planner was now picking lookup_customer_v2 for refund-eligibility queries that had previously routed cleanly to get_account_status, because the new tool's description happened to contain the word "eligibility" and ranked higher under whatever similarity heuristic the model uses internally.

This is the dependency bomb. Teams treat the tool registry as additive — "we're just adding one thing, what could go wrong" — but the planner doesn't see your registry as a list of independent capabilities. It sees a probability distribution over choices, and every entry redistributes the mass. Adding a tool can quietly subtract behavior somewhere else, and your eval suite will probably miss it because nobody wrote a regression test that says "the agent should still pick the old tool for this case."

Your Database Schema Is Your Agent's Mental Model

· 9 min read
Tian Pan
Software Engineer

Most teams building agents treat their database schema as a backend concern. The schema was designed by engineers, for engineers, following decades of relational database best practices: normalize aggressively, avoid redundancy, split reference tables, enforce foreign keys. This approach is correct for OLTP systems. It is often wrong for AI agents.

When an agent reads your schema to figure out how to answer a question, it is not parsing a data structure. It is constructing a mental model of your business. If your schema was built for application code that already understands the domain, the agent will be working against a map drawn for someone else. The result is hallucinated joins, incorrect aggregations, and tool call chains that should take two steps but take eight.

Agent Engineering Is a Discipline, Not a Vibe

· 10 min read
Tian Pan
Software Engineer

Most agent systems fail in production not because the underlying model is incapable. They fail because the engineering around the model is improvised. The model makes a wrong turn at step three and nobody notices until step eight, when the final answer is confidently wrong and there are no guardrails to catch it. This is not a model problem. It is an architecture problem.

Agent engineering has gone through at least two full hype cycles in three years. AutoGPT and BabyAGI generated enormous excitement in spring 2023, then crashed against the reality of GPT-4's unreliable tool use. A second wave arrived with multi-agent frameworks and agentic RAG in 2024. Now, in 2026, more than half of surveyed engineering teams report having agents running in production — and most of them have also discovered that deploying an agent and maintaining a reliable agent are different problems. The teams that are succeeding are treating agent engineering as a structured discipline. The teams that are struggling are still treating it as a vibe.

The Anatomy of an Agent Harness

· 9 min read
Tian Pan
Software Engineer

Most engineers building AI agents spend 80% of their time thinking about which model to use and 20% thinking about everything else. That ratio should be flipped. The model is almost interchangeable at this point — the harness is what determines whether your agent actually works in production.

The equation is simple: Agent = Model + Harness. If you're not the model, you're the harness. And the harness is where nearly all the real engineering lives.