Skip to main content

6 posts tagged with "orchestration"

View all tags

Agent Idempotency Is an Orchestration Contract, Not a Tool Property

· 10 min read
Tian Pan
Software Engineer

The support ticket arrives at 9:41 a.m.: "I was charged three times." The trace looks clean. One user message, one planner turn, three calls to charge_card — each with a distinct tool-use ID, each returning 200 OK, each writing a different Stripe charge. The tool has an idempotency key. The backend has a dedup table. The payment processor honors Idempotency-Key. Every layer is idempotent. The customer still paid three times.

This is the shape of the bug that will land on your desk if you build agents long enough. It is not a bug in any tool. It is a bug in the contract between the agent loop and the tools, and that contract almost always lives only in a senior engineer's head.

"Done!" Is Not a Return Code: Why Agent Completion Needs a Structured Signal

· 10 min read
Tian Pan
Software Engineer

An agent ends its turn with "All done — let me know if you want any changes!" and your orchestrator has to decide whether to mark the ticket resolved, kick off the next handoff, or retry. That sentence is not a return code. It is a polite closing line trained to sound reassuring at the end of a chat, and every line of automation downstream of it inherits the ambiguity. The teams that treat this as a parsing problem write regexes that catch \b(done|complete|finished)\b and call it a day. The teams whose agents run in production eventually learn that completion is an event, not a mood.

The failure mode is bimodal and boring. Either the agent announces done when it isn't — premature termination — and the orchestrator happily advances the workflow on a half-finished artifact. Or the agent is actually done, but phrases it in a way that doesn't match the detector ("I went ahead and landed the change, though the test for the edge case is still flaky"), and the orchestrator spins up a retry that re-does the work, duplicates the side effect, and sometimes contradicts the successful first pass. Both modes degrade silently. Neither shows up in a dashboard until someone reads a trace and notices that the agent said "I think that covers it" and the billing system treated that as a commit.

The fix is not smarter parsing. It is giving the agent a structured way to terminate — a done-tool with an enumerated status, a reason code, and a handle your pipeline can route on — and changing the orchestrator to wait for that event instead of listening to the chat stream.

When Your Agent Framework Becomes the Bug

· 8 min read
Tian Pan
Software Engineer

High-level agent frameworks promise to turn a three-day integration into a three-hour prototype. That promise is real. The problem is what happens next: six months into production, engineers at a company that builds AI-powered browser testing agents discovered they were spending as much time debugging LangChain as building features. Their fix was radical — they eliminated the framework entirely and went back to modular building blocks. "Once we removed it," they wrote, "we no longer had to translate our requirements into LangChain-appropriate solutions. We could just code."

They are not alone. Roughly 45% of developers who experiment with high-level LLM orchestration frameworks never deploy them to production. Another 23% eventually remove them after shipping. These numbers don't mean frameworks are bad tools — they mean frameworks are tools with a specific useful range, and that range is narrower than the demos suggest.

Building a Multi-Agent Research System: Patterns from Production

· 8 min read
Tian Pan
Software Engineer

When a single-agent system fails at a research task, the instinct is to add more memory, better tools, or a smarter model. But there's a point where the problem isn't capability — it's concurrency. Deep research tasks require pursuing multiple threads simultaneously: validating claims from different angles, scanning sources across domains, cross-referencing findings in real time. A single agent doing this sequentially is like a researcher reading every book one at a time before taking notes. The multi-agent alternative feels obvious in retrospect, but getting it right in production is considerably harder than the architecture diagram suggests.

This post is about how multi-agent research systems actually get built — the architectural choices that work, the failure modes that aren't obvious until you're in production, and the engineering discipline required to keep them useful at scale.

Why Multi-Agent Systems Break at the Seams: Designing Reliable Handoffs

· 8 min read
Tian Pan
Software Engineer

There's a pattern that plays out repeatedly when teams graduate from single-agent to multi-agent AI systems: individual agents work beautifully in isolation, but the system as a whole behaves unpredictably. The agents aren't the problem. The boundaries between them are.

Studies across production multi-agent deployments report failure rates ranging from 41% to 86.7% without formal orchestration. The most common post-mortem finding isn't "the LLM gave a bad answer" — it's "the wrong context reached the wrong agent at the wrong time." The seams between agents are where systems quietly fall apart.

AI Agent Architecture: What Actually Works in Production

· 11 min read
Tian Pan
Software Engineer

One company shipped 7,949 AI agents. Fifteen percent of them worked. The rest failed silently, looped endlessly, or contradicted themselves mid-task. This is not a fringe result — enterprise analyses consistently find that 88% of AI agent projects never reach production, and 95% of generative AI pilots fail or severely underperform. The gap between a compelling demo and a reliable system is not a model problem. It is an architecture problem.

The engineers who are shipping agents that actually work have converged on a set of structural decisions that look nothing like the toy examples in framework tutorials. This post is about those decisions: where the layers are, where failures concentrate, and why the hardest problems are not about prompts.