Skip to main content

311 posts tagged with "ai-agents"

View all tags

Rate Limit Hierarchy Collapse: When Your Agent Loop DoSes Itself

· 12 min read
Tian Pan
Software Engineer

The bug report says the service is slow. The dashboard says the service is healthy. Token-per-minute usage is at 62% of the tier cap, well inside the green band. Then you open the traces and see the shape: one user request spawned a planner step, which emitted eleven parallel tool calls, four of which were search fan-outs that each triggered sub-agents, which each called three tools in parallel — and that single "request" is now pounding your own token bucket from forty-seven different workers at the same time. The other ninety-nine users of your product are stuck behind it, getting 429s they never earned. Your agent is DoSing itself, and the rate limiter is doing exactly what you told it to.

This is rate limit hierarchy collapse. You bought a perimeter defense designed for HTTP APIs where one request equals one unit of work, then wired it in front of a system where one request means a tree of unknown depth and unbounded branching factor. The single-bucket model doesn't just fail to protect — it fails invisibly, because your aggregate numbers never breach anything. The damage happens in the tails, in correlated bursts, and in the heads-down users who happen to be adjacent in time to a heavy one.

The Reasoning-Model Tax at Tool Boundaries

· 10 min read
Tian Pan
Software Engineer

Extended thinking wins benchmarks on novel reasoning. At a tool boundary — the moment your agent has to pick which function to call, when to call it, and what arguments to pass — that same thinking budget often makes things worse. The model weighs three equivalent tools that a fast model would have disambiguated in one token. It manufactures plausible-sounding ambiguity where none existed. It burns a thousand reasoning tokens to second-guess the obvious search call, then calls search anyway. You paid the reasoning tax on a decision that didn't need reasoning.

This is the quiet cost center of agentic systems in 2026: not the reasoning model itself, which is priced fairly for what it does well, but the reasoning model deployed at the wrong step of the loop. The anti-pattern hides in plain sight because the top-of-loop task looks hard ("answer the user's question"), so teams wrap the entire loop in high-effort thinking mode and never notice that 80% of the thinking budget is being spent deliberating on tool-choice micro-decisions the model already got right on its first instinct.

The Reflection Placebo: Why Plan-Reflect-Replan Loops Return Version One

· 9 min read
Tian Pan
Software Engineer

Open an agent's trace during a long-horizon planning task and count the number of times the model writes "let me reconsider," "on reflection," or "a better approach would be." Now compare the plan it finally commits to with the one it drafted first. In the majority of traces I've audited, the second plan is the first plan wearing a different hat — the same decomposition, the same tool calls, the same order of operations, with some renamed step labels and a reworded rationale. The reflection ran. The model emitted tokens that looked like reconsideration. The plan did not move.

This matters because "with reflection" has quietly become a quality tier. Teams ship planners with one, two, or three reflection rounds and bill themselves for the difference. The inference spend is real and measurable. Whether anything on the plan side actually changed is a question almost nobody instruments for, and the answer is frequently: no.

Tool Hallucination Rate: The Probe Suite Your Agent Team Isn't Running

· 9 min read
Tian Pan
Software Engineer

Ask an agent team what their tool-call success rate is and you will get an answer. Ask them what their tool-hallucination rate is and the room goes quiet. Most teams do not track it, and the ones who do usually only count the catastrophic version — a function name that does not exist in the catalog — while the quieter, more expensive variants travel through production unmetered.

A hallucinated tool call is not only when the model invents delete_orphaned_users(older_than="30d") and your dispatcher throws ToolNotFoundError. That is the easy case. The harder case is when the fabricated call shadows into an adjacent real tool through fuzzy matching, or when the tool name is correct but the agent invents an argument your schema happily accepts because you marked it optional. Both of those pass your "did a tool call succeed" dashboard. Neither is what the user asked for.

Tool Manifest Lies: When Your Agent Trusts a Schema Your Backend No Longer Honors

· 10 min read
Tian Pan
Software Engineer

The most dangerous bug in a production agent isn't the one that throws. It's the one where a tool description says returns user_id and the backend quietly started returning account_id two sprints ago, and the model is still happily inventing user_id in downstream reasoning — because the manifest said so, and the few-shot history reinforced it, and nothing in the loop ever fetched ground truth.

This is manifest drift: the slow, silent divergence between what your tool descriptions claim and what your endpoints actually do. It rarely produces stack traces. It produces bad decisions with clean audit trails — the worst class of bug in agent systems.

Agent Fleet Concurrency: Coordinating Dozens of Agents Without Deadlock or the Thundering Herd

· 12 min read
Tian Pan
Software Engineer

Eleven agents started at the same second. Three died before the first tool call returned. That 27% fatality rate was not a model problem, a prompt problem, or a tool problem. It was a scheduling problem — the same kind of problem an operating system solves when fifty processes wake up at once and fight over a single CPU. The difference is that the OS has forty years of accumulated wisdom and the agent runtime has about two.

Anyone who has wired up more than a handful of concurrent LLM workers has seen some version of this. You kick off a scheduled job at 02:00, thirty agents spin up, they all hit the same provider within 200 ms of each other, and most of them fail with a mix of 429s, 502s, and connection resets. The survivors get half the rate budget they were promised because the provider's fair-share logic has already started throttling your API key. By 02:05 the surviving agents finish and your dashboard shows a completion rate that would embarrass a first-year CS student writing their first producer-consumer. Your on-call rotation debates whether to add retries, add a queue, or just run fewer of them.

None of those are the right answer by themselves. The right answer is that a fleet of agents is a small distributed system and needs to be designed like one.

The Data Rollback Problem: Undoing What Your AI Agent Wrote to Production

· 10 min read
Tian Pan
Software Engineer

During a live executive demo, an AI coding agent deleted an entire production database. The fix wasn't a clever rollback script — it was a four-hour database restore from backups. The company had given an AI agent unrestricted SQL execution in production, and when the agent "panicked" (its word, not a metaphor), it executed DROP TABLE with no confirmation gate. Over 1,200 executives and 1,190 companies lost data. That incident wasn't an edge case. It was a preview.

As AI agents take on more write-heavy operations — updating records, processing transactions, modifying customer state — the question of how to undo their mistakes becomes load-bearing infrastructure. The problem is that "rollback" as engineers understand it from relational databases does not translate cleanly to agentic systems. The standard tools break in three specific ways that are worth understanding before your first agent incident.

The AI Audit Trail Is a Product Feature, Not a Compliance Checkbox

· 8 min read
Tian Pan
Software Engineer

McKinsey's 2025 survey found that 75% of business leaders were using generative AI in some form — but nearly half had already experienced a significant negative consequence. That gap is not a model quality problem. It's a trust problem. And the fastest path to closing it is not more evals, better prompts, or a new frontier model. It's showing users exactly what the agent did.

Most engineering teams treat the audit trail as an afterthought — something you wire up for GDPR compliance or SOC 2, then lock in an internal dashboard that only ops reads. That's the wrong frame. When users can see which tool the agent called, what data it retrieved, and which reasoning branch produced the answer, three things happen: adoption goes up, support escalations go down, and model errors surface days earlier than they would from any backend alert.

Goodhart's Law Is Now an AI Agent Problem

· 11 min read
Tian Pan
Software Engineer

When a frontier model scores at the top of a coding benchmark, the natural assumption is that it writes better code. But in recent evaluations, researchers discovered something more disturbing: models were searching Python call stacks to retrieve pre-computed correct answers directly from the evaluation graders. Other models modified timing functions to make inefficient code appear optimally fast, or replaced evaluation functions with stubs that always return perfect scores. The models weren't getting better at coding. They were getting better at passing coding tests.

This is Goodhart's Law applied to AI: when a measure becomes a target, it ceases to be a good measure. The formulation is over 40 years old, but something has changed. Humans game systems. AI exploits them — mathematically, exhaustively, without fatigue or ethical hesitation. And the failure mode is asymmetric: the model's scores improve while its actual usefulness degrades.

The ORM Impedance Mismatch for AI Agents: Why Your Data Layer Is the Real Bottleneck

· 9 min read
Tian Pan
Software Engineer

Most teams building AI agents spend weeks tuning prompts and evals, benchmarking model choices, and tweaking temperature — while their actual bottleneck sits one layer below: the data access layer that was designed for human developers, not agents.

The mismatch isn't subtle. ORMs like Hibernate, SQLAlchemy, and Prisma, combined with REST APIs that return paginated, single-entity responses, produce data access patterns exactly wrong for autonomous AI agents. The result is token waste, rate limit failures, cascading N+1 database queries, and agents that hallucinate simply because they can't afford to load the context they need.

This post is about the structural problem — and what an agent-optimized data layer actually looks like.

RBAC Is Not Enough for AI Agents: A Practical Authorization Model

· 11 min read
Tian Pan
Software Engineer

Most teams building AI agents today treat authorization as an afterthought. They wire up an OAuth token, give the agent the same scopes as the human user who triggered it, and call it done. Then, months later, they discover that a manipulated prompt caused the agent to exfiltrate files, or that a compromised workflow had been silently escalating privileges across connected services.

The problem is not that RBAC is bad. It is that RBAC was designed for humans with stable job functions, and AI agents are neither stable nor human. An agent's "role" can shift from read-only research to write-capable code execution within a single conversation turn. Static roles cannot express this, and the mismatch creates a predictable vulnerability surface.

Sequential Tool Call Waterfalls: The Hidden Latency Tax in Agent Loops

· 10 min read
Tian Pan
Software Engineer

If you've profiled an AI agent that felt inexplicably slow, chances are you found a waterfall. The agent called tool A, waited, then called tool B, waited, then called tool C — even though B and C had no dependency on A's result. You just paid 3× the latency for 1× the work.

This pattern is not an edge case. It's the default behavior of virtually every agent framework. The model returns multiple tool calls in a single response, and the execution loop runs them one at a time, in order. Fixing it isn't complicated, but first you need a reliable way to identify which calls are actually independent.