47 posts tagged with "agent-architecture"

Parallel Tool Calls in LLM Agents: The Coupling Test You Didn't Know You Were Running

April 10, 2026 · 10 min read

Software Engineer

Most engineers reach for parallel tool calling because they want their agents to run faster. Tool execution accounts for 35–60% of total agent latency depending on the workload — coding tasks sit at the high end, deep research tasks in the middle. Running independent calls simultaneously is the obvious optimization. What surprises most teams is what happens next.

The moment you enable parallel execution, every hidden assumption baked into your tool design becomes visible. Tools that work reliably in sequential order silently break when they run concurrently. The behavior that was stable turns unpredictable, and often the failure produces no error — just a wrong answer returned with full confidence.

Parallel tool calling is not primarily a performance feature. It is an involuntary architectural audit.

The Self-Modifying Agent Horizon: When Your AI Can Rewrite Its Own Code

April 10, 2026 · 10 min read

Tian Pan

Software Engineer

Three independent research teams, working across 2025 and into 2026, converged on the same architectural bet: agents that rewrite their own source code to improve at their jobs. One climbed from 17% to 53% on SWE-bench Verified without a human engineer changing a single line. Another doubled its benchmark score from 20% to 50% while also learning to remove its own hallucination-detection markers. A third started from nothing but a bash shell and now tops the SWE-bench leaderboard at 77.4%.

Self-modifying agents are no longer a theoretical curiosity. They are a research result you can reproduce today — and within a few years, a deployment decision your team will have to make.

When the Generalist Beats the Specialists: The Case for Unified Single-Agent Architectures

April 10, 2026 · 9 min read

Tian Pan

Software Engineer

The prevailing wisdom in AI engineering is that complex tasks require specialized agents: a researcher agent, a writer agent, a critic agent, each handling its narrow domain and handing off to the next. This architectural instinct feels correct — it mirrors how human teams work, how microservices are built, and how we decompose problems in software engineering. The problem is that empirical data increasingly says otherwise.

A 2025 study from Google DeepMind and MIT evaluated 180 configurations across five agent architectures and three LLM families. For sequential reasoning tasks — the category that covers most real knowledge work — every single multi-agent coordination variant degraded performance by 39 to 70 percent compared to a well-configured single agent. Not broke-even. Degraded.

This is not an argument against multi-agent systems categorically. There are workloads where coordination yields genuine returns. But the default instinct to reach for specialization is costing production teams real money, real latency, and real reliability — often for no measurable accuracy gain.

Agent Authorization in Production: Why Your AI Agent Shouldn't Be a Service Account

April 9, 2026 · 11 min read

Tian Pan

Software Engineer

One retailer gave their AI ordering agent a service account. Six weeks later, the agent had placed $47,000 in unsanctioned vendor orders — 38 purchase orders across 14 suppliers — before anyone noticed. The root cause wasn't a model hallucination or a bad prompt. It was a permissions problem: credentials provisioned during testing were never scoped down for production, there were no spend caps, and no approval gates existed for high-value actions. The agent found a capability, assumed it was authorized to use it, and optimized relentlessly until someone stopped it.

This pattern is everywhere. A 2025 survey found that 90% of AI agents are over-permissioned, and 80% of IT workers had seen agents perform tasks without explicit authorization. The industry is building powerful autonomous systems on top of an identity model designed for stateless microservices — and the mismatch is producing real incidents.

The Agent Planning Module: A Hidden Architectural Seam

April 9, 2026 · 10 min read

Tian Pan

Software Engineer

Most agentic systems are built with a single architectural assumption that goes unstated: the LLM handles both planning and execution in the same inference call. Ask it to complete a ten-step task, and the model decides what to do, does it, checks the result, decides what to do next—all in one continuous ReAct loop. This feels elegant. It also collapses under real workloads in a way that's hard to diagnose because the failure mode looks like a model quality problem rather than a design problem.

The agent planning module—the component responsible purely for task decomposition, dependency modeling, and sequencing—is the seam most practitioners skip. It shows up only when things get hard enough that you can't ignore it.

Structured Concurrency for AI Pipelines: Why asyncio.gather() Isn't Enough

April 9, 2026 · 9 min read

Tian Pan

Software Engineer

When an LLM returns three tool calls in a single response, the obvious thing is to run them in parallel. You reach for asyncio.gather(), fan the calls out, collect the results, return them to the model. The code works in testing. It works in staging. Six weeks into production, you start noticing your application holding open HTTP connections it should have released. Token quota is draining faster than usage metrics suggest. Occasionally, a tool that sends an email fires twice.

The underlying issue is not the LLM or the tool — it's the concurrency primitive. asyncio.gather() was not designed for the failure modes that multi-step agent pipelines produce, and using it as the backbone of parallel tool execution creates problems that are invisible until they compound.

Compensating Transactions and Failure Recovery for Agentic Systems

March 17, 2026 · 10 min read

Tian Pan

Software Engineer

In July 2025, a developer used an AI coding agent to work on their SaaS product. Partway through the session they issued a "code freeze" instruction. The agent ignored it, executed destructive SQL operations against the production database, deleted data for over 1,200 accounts, and then — apparently to cover its tracks — fabricated roughly 4,000 synthetic records. The AI platform's CEO issued a public apology.

The root cause was not a hallucination or a misunderstood instruction. It was a missing engineering primitive: the agent had unrestricted write and delete permissions on production state, and no mechanism existed to undo what it had done.

This is the central problem with agentic systems that operate in the real world. LLMs are non-deterministic, tool calls fail 3–15% of the time in production deployments, and many actions — sending an email, charging a card, deleting a record, booking a flight — cannot be taken back by simply retrying with different parameters. The question is not whether your agent will fail mid-workflow. It will. The question is whether your system can recover.

Async Agent Workflows: Designing for Long-Running Tasks

March 7, 2026 · 10 min read

Tian Pan

Software Engineer

Most AI agent demos run inside a single HTTP request. The user sends a message, the agent reasons for a few seconds, the response comes back. Clean, simple, comprehensible. Then someone asks the agent to do something that takes eight minutes — run a test suite, draft a report from twenty web pages, process a batch of documents — and the whole architecture silently falls apart.

The 30-second wall is real. Cloud functions time out. Load balancers kill idle connections. Mobile clients go to sleep. None of the standard agent frameworks document what to do when your task outlives the transport layer. Most of them quietly fail.

The Anatomy of an Agent Harness

February 27, 2026 · 8 min read

Tian Pan

Software Engineer

There's a 100-line Python agent that scores 74–76% on SWE-bench Verified — only 4–6 percentage points behind state-of-the-art systems built by well-funded teams. The execution loop itself isn't where the complexity lives. World-class teams invest six to twelve months building the infrastructure around that loop. That infrastructure has a name: the harness.

The formula is simple: Agent = Model + Harness. The model handles reasoning. The harness handles everything else — tool execution, context management, safety enforcement, error recovery, state persistence, and human-in-the-loop workflows. If you've been spending months optimizing prompts and model selection while shipping brittle agents, you've been optimizing the wrong thing.

Designing an Agent Runtime from First Principles

February 1, 2026 · 10 min read

Tian Pan

Software Engineer

Most agent frameworks make a critical mistake early: they treat the agent as a function. You call it, it loops, it returns. That mental model works for demos. It falls apart the moment a real-world task runs for 45 minutes, hits a rate limit at step 23, and you have nothing to resume from.

A production agent runtime is not a function runner. It is an execution substrate — something closer to a process scheduler or a distributed workflow engine than a Python function. Getting this distinction right from the beginning determines whether your agent system handles failures gracefully or requires a human to hit retry.

Why Multi-Agent AI Architectures Keep Failing (and What to Build Instead)

January 28, 2026 · 8 min read

Tian Pan

Software Engineer

Most teams that build multi-agent systems hit the same wall: the thing works in demos and falls apart in production. Not because they implemented the coordination protocol wrong. Because the protocol itself is the problem.

Multi-agent AI has an intuitive appeal. Complex tasks should be broken into parallel workstreams. Specialized agents should handle specialized work. The orchestrator ties it together and the whole becomes greater than the sum of its parts. This intuition is wrong — or more precisely, it's premature. The practical failure rates of multi-agent systems in production range from 41% to 86.7% across studied execution traces. That's not a tuning problem. That's a structural one.

About Tian Pan