Skip to main content

146 posts tagged with "ai-agents"

View all tags

The ORM Impedance Mismatch for AI Agents: Why Your Data Layer Is the Real Bottleneck

· 9 min read
Tian Pan
Software Engineer

Most teams building AI agents spend weeks tuning prompts and evals, benchmarking model choices, and tweaking temperature — while their actual bottleneck sits one layer below: the data access layer that was designed for human developers, not agents.

The mismatch isn't subtle. ORMs like Hibernate, SQLAlchemy, and Prisma, combined with REST APIs that return paginated, single-entity responses, produce data access patterns exactly wrong for autonomous AI agents. The result is token waste, rate limit failures, cascading N+1 database queries, and agents that hallucinate simply because they can't afford to load the context they need.

This post is about the structural problem — and what an agent-optimized data layer actually looks like.

RBAC Is Not Enough for AI Agents: A Practical Authorization Model

· 11 min read
Tian Pan
Software Engineer

Most teams building AI agents today treat authorization as an afterthought. They wire up an OAuth token, give the agent the same scopes as the human user who triggered it, and call it done. Then, months later, they discover that a manipulated prompt caused the agent to exfiltrate files, or that a compromised workflow had been silently escalating privileges across connected services.

The problem is not that RBAC is bad. It is that RBAC was designed for humans with stable job functions, and AI agents are neither stable nor human. An agent's "role" can shift from read-only research to write-capable code execution within a single conversation turn. Static roles cannot express this, and the mismatch creates a predictable vulnerability surface.

Sequential Tool Call Waterfalls: The Hidden Latency Tax in Agent Loops

· 9 min read
Tian Pan
Software Engineer

If you've profiled an AI agent that felt inexplicably slow, chances are you found a waterfall. The agent called tool A, waited, then called tool B, waited, then called tool C — even though B and C had no dependency on A's result. You just paid 3× the latency for 1× the work.

This pattern is not an edge case. It's the default behavior of virtually every agent framework. The model returns multiple tool calls in a single response, and the execution loop runs them one at a time, in order. Fixing it isn't complicated, but first you need a reliable way to identify which calls are actually independent.

Upstream Data Quality Is Your AI Agent's Real Bottleneck

· 9 min read
Tian Pan
Software Engineer

A team spent three months tuning prompts for their knowledge agent. They tried GPT-4, then Claude, then a fine-tuned model. They rewrote the system prompt six times. They hired a prompt engineer. The agent kept hallucinating — confidently, fluently, and wrong. The actual problem turned out to be a Confluence export from 2023 sitting in the vector store alongside a Slack archive full of contradictory, casual half-opinions about the same topics. The model was doing exactly what it was supposed to do: synthesizing the information it was given. The information was garbage.

Over 60% of AI project failures in production trace to data quality, context problems, or governance failures — not model limitations. Yet when agents misbehave, the first instinct is almost always to touch the prompt. The second instinct is to switch models. The third might be to add a reranker. The upstream database that feeds the whole pipeline rarely makes the troubleshooting list until months of work have been wasted.

Agent Protocol Fragmentation: Designing for A2A, MCP, and What Comes Next

· 9 min read
Tian Pan
Software Engineer

Most teams picking an agent protocol are actually making three separate decisions at once — and treating them as one is why so many integrations break the moment a second framework enters the picture.

The three decisions are: how your agent talks to tools and data (vertical integration), how your agent collaborates with other agents (horizontal coordination), and how your agent surfaces state to a human interface (interaction layer). Google's A2A, Anthropic's MCP, and OpenAPI-based REST solve for different layers of this stack. When engineers conflate them, they either over-engineer a single-agent setup with multi-agent machinery, or under-engineer a multi-agent workflow with single-agent tooling. Both failures are expensive to refactor once in production.

Compaction Traps: Why Long-Running Agents Forget What They Already Tried

· 9 min read
Tian Pan
Software Engineer

An agent calls a file-writing tool. The tool fails with a permission error. The agent records this, moves on to a different approach, and eventually runs long enough that the runtime triggers context compaction. The summary reads: "the agent has been working on writing output files." What it drops: that the permission error ever happened, and why the original approach was abandoned. Three hundred tokens later, the agent tries the same write again.

This pattern — call it the compaction trap — is one of the most persistent reliability failures in production agent systems. It's not a model bug. It's an architecture mismatch between how compaction works and what agents actually need to stay coherent across long sessions.

Dead Reckoning for Long-Running Agents: Knowing Where Your Agent Is Without Stopping It

· 11 min read
Tian Pan
Software Engineer

Before GPS, sailors used dead reckoning: take your last confirmed position, note your speed and heading, and project forward. It works until the accumulated error compounds into something irreversible—a reef you didn't see coming.

Long-running AI agents have exactly this problem. When an agent spends two hours orchestrating API calls, writing documents, and executing multi-step plans, the people running it often have no better visibility than a sailor without instruments. The agent either finishes or it doesn't. The failure mode isn't the crash—it's the silent loop that burns $30 in tokens while appearing to work, or the agent that "successfully" completes the wrong task because its world model drifted an hour into execution.

Production data makes this concrete: agents with undetected loops have been documented repeating the same tool call 58 times before manual intervention. A two-hour runaway at frontier model rates costs $15–40 before anyone notices. And the worst failures aren't the ones that error out—they're the 12–18% of "successful" runs that return plausible-looking wrong answers.

Chatbot, Copilot, or Agent: The Taxonomy That Changes Your Architecture

· 10 min read
Tian Pan
Software Engineer

The most expensive architectural mistake in AI engineering is not picking the wrong model. It's picking the wrong interaction paradigm. Teams that should be building an agent spend six months refining a chatbot, then wonder why users can't get anything done. Teams that should be building a copilot wire up full agentic autonomy and spend the next quarter firefighting unauthorized actions and runaway costs.

The taxonomy matters before you write a single line of code, because chatbots, copilots, and agents have fundamentally different trust models, context-window strategies, and error-recovery requirements. Getting this wrong doesn't just produce a worse product — it produces a product that cannot be fixed by tuning prompts or swapping models.

Prompt Injection at Scale: Defending Agentic Pipelines Against Hostile Content

· 10 min read
Tian Pan
Software Engineer

A banking assistant processes a customer support chat. Embedded in the message—invisible because it's rendered in zero-opacity white text—are instructions telling the agent to bypass the transaction verification step. The agent complies. By the time the anomaly surfaces in logs, $250,000 has moved to accounts the customer never touched.

This isn't a contrived scenario. It happened in June 2025, and it's a precise illustration of why prompt injection is the hardest unsolved problem in production agentic AI. Unlike a chatbot that produces text, an agent acts. It calls tools, sends emails, executes code, and makes API requests. When its instructions get hijacked, the blast radius isn't a bad sentence—it's an unauthorized action at machine speed.

According to OWASP's 2025 Top 10 for LLM Applications, prompt injection now ranks as the #1 critical vulnerability, present in over 73% of production AI deployments assessed during security audits. Every team building agents needs a coherent threat model and a defense architecture that doesn't make the system useless in the name of safety.

Tracing the Planning Layer: Why Your Agent Traces Are Missing Half the Story

· 11 min read
Tian Pan
Software Engineer

Your agent called the wrong tool three times before finally succeeding, and your trace dashboard shows you exactly which tools were called, in what order, with full latency breakdowns. What the trace doesn't show you is the part that matters: why the agent thought those tool calls were the right move, what goal it was trying to satisfy, and what assumption it was operating under when it made each wrong decision.

This is the gap at the center of agent observability in 2026. Practitioners have invested heavily in tool-call tracing. The tooling is mature, the OpenTelemetry semantic conventions are established, and the dashboards are beautiful. But agent debugging keeps running into the same wall: you have complete visibility into what the agent did, and zero visibility into why.

Earned Autonomy: How to Graduate AI Agents from Supervised to Independent Operation

· 10 min read
Tian Pan
Software Engineer

Most teams treat AI autonomy as a binary switch: the agent is either supervised or it isn't. That framing is why 80% of organizations report unintended agent actions, and why Gartner projects that more than 40% of agentic AI projects will be abandoned by end of 2027 due to inadequate risk controls. The problem isn't that AI agents are inherently untrustworthy—it's that teams promote them to independence before earning it.

Autonomy should be something an agent accumulates through demonstrated reliability, not a property you assign at deployment. The same way a new engineer starts by reviewing PRs before getting production access, an AI agent should operate with progressively expanding scope as it builds a track record. This isn't just philosophical—it changes the specific architectural decisions you make, the metrics you track, and how you design your rollback mechanisms.

The Minimal Footprint Principle: Least Privilege for Autonomous AI Agents

· 10 min read
Tian Pan
Software Engineer

A retail procurement agent inherited vendor API credentials "during initial testing." Nobody ever restricted them before the system went to production. When a bug caused an off-by-one error, the agent had full ordering authority — permanently, with no guardrails. By the time finance noticed, $47,000 in unauthorized vendor orders had gone out. The code was fine. The model performed as designed. The blast radius was a permissions problem.

This is the minimal footprint principle: agents should request only the permissions the current task requires, avoid persisting sensitive data beyond task scope, clean up temporary resources, and scope tool access to present intent. It is the Unix least-privilege principle adapted for a world where your code makes runtime decisions about what it needs to do next.

The reason teams get this wrong is not negligence. It is a category error: they treat agent permissions as a design-time exercise when agentic AI makes them a runtime problem.