Skip to main content

6 posts tagged with "agent-architecture"

View all tags

Structured Concurrency for AI Pipelines: Why asyncio.gather() Isn't Enough

· 9 min read
Tian Pan
Software Engineer

When an LLM returns three tool calls in a single response, the obvious thing is to run them in parallel. You reach for asyncio.gather(), fan the calls out, collect the results, return them to the model. The code works in testing. It works in staging. Six weeks into production, you start noticing your application holding open HTTP connections it should have released. Token quota is draining faster than usage metrics suggest. Occasionally, a tool that sends an email fires twice.

The underlying issue is not the LLM or the tool — it's the concurrency primitive. asyncio.gather() was not designed for the failure modes that multi-step agent pipelines produce, and using it as the backbone of parallel tool execution creates problems that are invisible until they compound.

Compensating Transactions and Failure Recovery for Agentic Systems

· 10 min read
Tian Pan
Software Engineer

In July 2025, a developer used an AI coding agent to work on their SaaS product. Partway through the session they issued a "code freeze" instruction. The agent ignored it, executed destructive SQL operations against the production database, deleted data for over 1,200 accounts, and then — apparently to cover its tracks — fabricated roughly 4,000 synthetic records. The AI platform's CEO issued a public apology.

The root cause was not a hallucination or a misunderstood instruction. It was a missing engineering primitive: the agent had unrestricted write and delete permissions on production state, and no mechanism existed to undo what it had done.

This is the central problem with agentic systems that operate in the real world. LLMs are non-deterministic, tool calls fail 3–15% of the time in production deployments, and many actions — sending an email, charging a card, deleting a record, booking a flight — cannot be taken back by simply retrying with different parameters. The question is not whether your agent will fail mid-workflow. It will. The question is whether your system can recover.

Async Agent Workflows: Designing for Long-Running Tasks

· 10 min read
Tian Pan
Software Engineer

Most AI agent demos run inside a single HTTP request. The user sends a message, the agent reasons for a few seconds, the response comes back. Clean, simple, comprehensible. Then someone asks the agent to do something that takes eight minutes — run a test suite, draft a report from twenty web pages, process a batch of documents — and the whole architecture silently falls apart.

The 30-second wall is real. Cloud functions time out. Load balancers kill idle connections. Mobile clients go to sleep. None of the standard agent frameworks document what to do when your task outlives the transport layer. Most of them quietly fail.

The Anatomy of an Agent Harness

· 8 min read
Tian Pan
Software Engineer

There's a 100-line Python agent that scores 74–76% on SWE-bench Verified — only 4–6 percentage points behind state-of-the-art systems built by well-funded teams. The execution loop itself isn't where the complexity lives. World-class teams invest six to twelve months building the infrastructure around that loop. That infrastructure has a name: the harness.

The formula is simple: Agent = Model + Harness. The model handles reasoning. The harness handles everything else — tool execution, context management, safety enforcement, error recovery, state persistence, and human-in-the-loop workflows. If you've been spending months optimizing prompts and model selection while shipping brittle agents, you've been optimizing the wrong thing.

Designing an Agent Runtime from First Principles

· 10 min read
Tian Pan
Software Engineer

Most agent frameworks make a critical mistake early: they treat the agent as a function. You call it, it loops, it returns. That mental model works for demos. It falls apart the moment a real-world task runs for 45 minutes, hits a rate limit at step 23, and you have nothing to resume from.

A production agent runtime is not a function runner. It is an execution substrate — something closer to a process scheduler or a distributed workflow engine than a Python function. Getting this distinction right from the beginning determines whether your agent system handles failures gracefully or requires a human to hit retry.

Why Multi-Agent AI Architectures Keep Failing (and What to Build Instead)

· 8 min read
Tian Pan
Software Engineer

Most teams that build multi-agent systems hit the same wall: the thing works in demos and falls apart in production. Not because they implemented the coordination protocol wrong. Because the protocol itself is the problem.

Multi-agent AI has an intuitive appeal. Complex tasks should be broken into parallel workstreams. Specialized agents should handle specialized work. The orchestrator ties it together and the whole becomes greater than the sum of its parts. This intuition is wrong — or more precisely, it's premature. The practical failure rates of multi-agent systems in production range from 41% to 86.7% across studied execution traces. That's not a tuning problem. That's a structural one.