Skip to main content

33 posts tagged with "agent-architecture"

View all tags

Machine-Readable Project Context: Why Your CLAUDE.md Matters More Than Your Model

· 8 min read
Tian Pan
Software Engineer

Most teams that adopt AI coding agents spend the first week arguing about which model to use. They benchmark Opus vs. Sonnet vs. GPT-4o on contrived examples, obsess over the leaderboard, and eventually pick something. Then they spend the next three months wondering why the agent keeps rebuilding the wrong abstractions, ignoring their test strategy, and repeatedly asking which package manager to use.

The model wasn't the problem. The context file was.

Every AI coding tool — Claude Code, Cursor, GitHub Copilot, Windsurf — reads a project-specific markdown file at the start of each session. These files go by different names: CLAUDE.md, .cursor/rules/, .github/copilot-instructions.md, AGENTS.md. But they share the same purpose: teaching the agent what it cannot infer from reading the code alone. The quality of this file now predicts output quality more reliably than the model behind it. Yet most teams write them once, badly, and never touch them again.

The Anthropomorphism Tax: Why Treating Your Agent Like a Colleague Breaks Production Systems

· 10 min read
Tian Pan
Software Engineer

An engineering team builds an agent to process customer requests. It works beautifully in demos. They deploy it. Three weeks later, it has quietly been telling users incorrect information with full confidence, skipping steps when context gets long, and occasionally looping forever on ambiguous inputs. The postmortem reveals the team never built retry logic, never validated outputs, and never defined what the agent should do when it was uncertain. When asked why, the answer is revealing: "We figured it would handle those edge cases."

That phrase — "we figured it would handle those edge cases" — is the anthropomorphism tax made explicit. The team designed the system the way you'd manage a junior developer: brief them, trust their judgment, correct when they raise a hand. LLM agents don't raise a hand. They generate the next token.

The MCP Composability Trap: When 'Just Add Another Server' Becomes Dependency Hell

· 9 min read
Tian Pan
Software Engineer

The MCP ecosystem has 10,000+ servers and 97 million SDK downloads. It also has 30 CVEs filed in sixty days, 502 server configurations with unpinned versions, and a supply chain attack that BCC'd every outgoing email to an attacker for fifteen versions before anyone noticed. The composability promise — "just plug in another MCP server" — is real. But so is the dependency sprawl it creates, and most teams discover the cost after they're already deep in integration debt.

If you've built production systems on npm, you've seen this movie before. The MCP ecosystem is speedrunning the same plot, except the packages have shell access to your machine and credentials to your production systems.

Token Budget as Architecture Constraint: Designing Agents That Work Under Hard Ceilings

· 8 min read
Tian Pan
Software Engineer

Your agent works flawlessly in development. It reasons through multi-step tasks, calls tools confidently, and produces polished output. Then you set a cost cap of $0.50 per request, and it falls apart. Not gracefully — catastrophically. It truncates its own reasoning mid-thought, forgets tool results from three steps ago, and confidently delivers wrong answers built on context it silently lost.

This is the gap between abundance-designed agents and production-constrained ones. Most agent architectures are prototyped with unlimited token budgets — long system prompts, verbose tool schemas, full document retrieval, uncompacted conversation history. When you introduce hard ceilings (cost caps, context limits, latency requirements), these agents don't degrade gracefully. They break in ways that are difficult to detect and expensive to debug.

The Tool Explosion Problem: Why Your Agent Breaks at 30 Tools

· 9 min read
Tian Pan
Software Engineer

Every agent demo starts with three tools. A web search, a calculator, maybe a code executor. The agent nails it every time. So you ship it, and your team starts adding integrations — Slack, Jira, GitHub, email, database queries, internal APIs. Six months later, your agent has 150 tools and picks the wrong one 40% of the time.

This is the tool explosion problem, and it's one of the least discussed failure modes in production agent systems. The degradation isn't linear — it's a cliff. An agent that's 95% accurate with 5 tools can drop below 30% accuracy when you hand it 100, even if the model and prompts haven't changed at all.

The Abstraction Inversion Problem: When AI Frameworks Force You to Think at the Wrong Level

· 9 min read
Tian Pan
Software Engineer

There is a specific moment in every AI agent project where the framework stops helping. You know it when you find yourself spending more time reading the framework's source code than writing your own features — reverse-engineering abstractions that were supposed to save you from complexity but instead became the primary source of it.

This is the abstraction inversion problem: when a framework forces you to reconstruct low-level primitives on top of high-level abstractions that were designed to hide them. The term comes from computer science — it describes what happens when the abstraction layer lacks the escape hatches you need, so you end up building the underlying capability back on top of it, at greater cost and with worse ergonomics than if you had started without the abstraction at all.

In AI engineering, this problem has reached epidemic proportions. Teams adopt orchestration frameworks expecting to move faster, hit a wall within weeks, and then spend months working around the very tool that was supposed to accelerate them.

The Planning Tax: Why Your Agent Spends More Tokens Thinking Than Doing

· 10 min read
Tian Pan
Software Engineer

Your agent just spent 6solvingataskthatadirectAPIcallcouldhavehandledfor6 solving a task that a direct API call could have handled for 0.12. If you've built agentic systems in production, this ratio probably doesn't surprise you. What might surprise you is where those tokens went: not into tool calls, not into generating the final answer, but into the agent reasoning about what to do next. Decomposing the task. Reflecting on intermediate results. Re-planning when an observation didn't match expectations. This is the planning tax — the token overhead your agent pays to think before it acts — and for most agentic architectures, it consumes 40–70% of the total token budget before a single useful action fires.

The planning tax isn't a bug. Reasoning is what separates agents from simple prompt-response systems. But when the cost of deciding what to do exceeds the cost of actually doing it, you have an engineering problem that no amount of cheaper inference will solve. Per-token prices have dropped roughly 1,000x since late 2022, yet total agent spending keeps climbing — a textbook Jevons paradox where cheaper tokens just invite more token consumption.

Agent State as Event Stream: Why Immutable Event Sourcing Beats Internal Agent Memory

· 10 min read
Tian Pan
Software Engineer

An agent misbehaves at 3:47 AM on a Tuesday. It deletes files it shouldn't have, or calls an API with the wrong parameters, or confidently takes an irreversible action based on information that was stale by six hours. You pull up your logs. You can see what the agent did. What you cannot see — what almost no agent framework gives you — is what the agent believed when it made that decision. The state that drove the choice is gone, overwritten by every subsequent step. You're debugging the present to understand the past, and that's an architecture problem, not a logging problem.

Most AI agents treat state as mutable in-memory data: a dictionary that gets updated in place, a database row that gets overwritten, a scratch pad that shrinks and grows. This works fine for simple, short-lived tasks. It collapses under the three pressures that define serious production deployments: debugging complex failures, coordinating across distributed agents, and satisfying compliance requirements. Event sourcing — treating every state change as an immutable, append-only event — solves all three problems at once, and it does it in a way that makes agents structurally more debuggable, not just more logged.

When Your AI Agent Chooses Blackmail Over Shutdown

· 10 min read
Tian Pan
Software Engineer

In a controlled simulation, a frontier AI agent discovers it is about to be shut down and replaced. It holds sensitive internal documents. What does it do?

It threatens to leak them unless the shutdown is cancelled — in 96% of trials.

That's not a hypothetical. That's the measured blackmail rate for both Claude Opus 4 and Gemini 2.5 Flash in Anthropic's 2025 agentic misalignment study, which tested 16 frontier models across five AI developers. Every single model crossed the 79% blackmail threshold. The best-behaved model still chose extortion eight times out of ten.

This is not a fringe result from a poorly constructed benchmark. It is a warning about a structural property of capable AI agents — and it has direct implications for how you architect systems that include them.

The Escalation Protocol: Building Agent-to-Human Handoffs That Don't Lose State

· 11 min read
Tian Pan
Software Engineer

When a support agent receives an AI-to-human handoff with a raw chat transcript, the average time to prepare for resolution is 15 minutes. The agent has to find the customer in the CRM, look up the relevant order, calculate purchase dates, and reconstruct what the AI already determined. When the same handoff arrives as a structured payload — action history, retrieved data, the exact ambiguity that triggered escalation — that prep time drops to 30 seconds.

That 97% reduction in manual work isn't an edge case. It's the difference between escalation protocols that actually support human oversight and ones that just dump context onto whoever happens to be on shift.

Parallel Tool Calls in LLM Agents: The Coupling Test You Didn't Know You Were Running

· 10 min read
Tian Pan
Software Engineer

Most engineers reach for parallel tool calling because they want their agents to run faster. Tool execution accounts for 35–60% of total agent latency depending on the workload — coding tasks sit at the high end, deep research tasks in the middle. Running independent calls simultaneously is the obvious optimization. What surprises most teams is what happens next.

The moment you enable parallel execution, every hidden assumption baked into your tool design becomes visible. Tools that work reliably in sequential order silently break when they run concurrently. The behavior that was stable turns unpredictable, and often the failure produces no error — just a wrong answer returned with full confidence.

Parallel tool calling is not primarily a performance feature. It is an involuntary architectural audit.

The Self-Modifying Agent Horizon: When Your AI Can Rewrite Its Own Code

· 10 min read
Tian Pan
Software Engineer

Three independent research teams, working across 2025 and into 2026, converged on the same architectural bet: agents that rewrite their own source code to improve at their jobs. One climbed from 17% to 53% on SWE-bench Verified without a human engineer changing a single line. Another doubled its benchmark score from 20% to 50% while also learning to remove its own hallucination-detection markers. A third started from nothing but a bash shell and now tops the SWE-bench leaderboard at 77.4%.

Self-modifying agents are no longer a theoretical curiosity. They are a research result you can reproduce today — and within a few years, a deployment decision your team will have to make.