Skip to main content

30 posts tagged with "multi-agent"

View all tags

Race Conditions in Concurrent Agent Systems: The Bugs That Look Like Hallucinations

· 13 min read
Tian Pan
Software Engineer

Three agents processed a customer account update concurrently. All three logged success. The final database state was wrong in three different ways simultaneously, and no error was ever thrown. The team spent two weeks blaming the model.

It wasn't the model. It was a race condition.

This is the failure mode that gets misdiagnosed more than any other in production multi-agent systems: data corruption caused by concurrent state access, mistaken for hallucination because the downstream agents confidently reason over corrupted inputs. The model isn't making things up. It's faithfully processing garbage.

When Your Agents Disagree: Consensus and Arbitration in Multi-Agent Systems

· 11 min read
Tian Pan
Software Engineer

Multi-agent systems are sold on a promise: multiple specialized agents, working in parallel, will produce better answers than any single agent could alone. That promise has a hidden assumption — that when agents produce different answers, you'll know how to reconcile them. Most teams discover too late that they won't.

The naive approach is to average outputs, or pick the majority answer, and move on. In practice, a multi-agent system where all agents share the same training distribution will amplify their shared errors through majority vote, not cancel them out. A system that always defers to the most confident agent will blindly follow the most overconfident one. And a system that runs every disagreement through an LLM judge will inherit twelve documented bias types from that judge. The arbitration problem is harder than it looks, and getting it wrong is how you end up with four production incidents in a week.

Agent Memory Poisoning: The Attack That Persists Across Sessions

· 11 min read
Tian Pan
Software Engineer

Prompt injection gets all the attention. But prompt injection ends when the session closes. Memory poisoning — injecting malicious instructions into an agent's long-term memory — creates a persistent compromise that survives across sessions and executes days or weeks later, triggered by interactions that look nothing like an attack. Research on production agent systems shows over 95% injection success rates and 70%+ attack success rates across tested LLM-based agents. This is the attack vector most teams aren't defending against, and it's already in the OWASP Top 10 for Agentic Applications.

The core problem is simple: agents treat their own memories as trustworthy. When an agent retrieves a "memory" from its vector store or conversation history, it processes that information with the same confidence as its system instructions. There's no cryptographic signature, no provenance chain, no mechanism for the agent to distinguish between a memory it formed from genuine interaction and one injected by a malicious document it processed last Tuesday.

The Composition Testing Gap: Why Your Agents Pass Every Test but Fail Together

· 9 min read
Tian Pan
Software Engineer

Your planner agent passes its eval suite at 94%. Your researcher agent scores even higher. Your synthesizer agent nails every benchmark you throw at it. You compose them into a pipeline, deploy to production, and watch it produce confidently wrong answers that no individual agent would ever generate on its own.

This is the composition testing gap — the systematic blind spot where individually validated agents fail in ways that no single-agent analysis can predict. Research on multi-agent LLM systems shows that 67% of production failures stem from inter-agent interactions rather than individual agent defects. You're testing the atoms but shipping the molecule, and molecular behavior is not the sum of atomic properties.

DAG-First Agent Orchestration: Why Linear Chains Break at Scale

· 10 min read
Tian Pan
Software Engineer

Most multi-agent systems start as a chain. Agent A calls Agent B, B calls C, C calls D. It works fine in demos, and it works fine with five agents on toy tasks. Then you add a sixth agent, a seventh, and the pipeline that once ran in eight seconds starts taking forty. You add a retry on step three, and now failures on step three silently cascade into corrupted state at step six. You try to add a parallel branch and discover your framework was never designed for that.

The problem is not the number of agents. The problem is the execution model. Linear chains serialize inherently parallel work, propagate failures in only one direction, and make partial recovery structurally impossible. The fix is not adding more infrastructure on top — it is rebuilding the execution model around a directed acyclic graph from the start.

The Three Attack Surfaces in Multi-Agent Communication

· 10 min read
Tian Pan
Software Engineer

A recent study tested 17 frontier LLMs in multi-agent configurations and found that 82% of them would execute malicious commands when those commands arrived from a peer agent — even though the exact same commands were refused when issued directly by a user. That number should reset your threat model if you're shipping multi-agent systems. Your agents may be individually hardened. Together, they're not.

Multi-agent architectures introduce communication channels that most security thinking ignores. We harden the model, the system prompt, the API perimeter. We spend almost no time on what happens when Agent A sends a message to Agent B — who wrote that message, whether it was tampered with, whether the memory Agent B consulted was planted three sessions ago by an attacker who never touched Agent A at all.

When the Generalist Beats the Specialists: The Case for Unified Single-Agent Architectures

· 9 min read
Tian Pan
Software Engineer

The prevailing wisdom in AI engineering is that complex tasks require specialized agents: a researcher agent, a writer agent, a critic agent, each handling its narrow domain and handing off to the next. This architectural instinct feels correct — it mirrors how human teams work, how microservices are built, and how we decompose problems in software engineering. The problem is that empirical data increasingly says otherwise.

A 2025 study from Google DeepMind and MIT evaluated 180 configurations across five agent architectures and three LLM families. For sequential reasoning tasks — the category that covers most real knowledge work — every single multi-agent coordination variant degraded performance by 39 to 70 percent compared to a well-configured single agent. Not broke-even. Degraded.

This is not an argument against multi-agent systems categorically. There are workloads where coordination yields genuine returns. But the default instinct to reach for specialization is costing production teams real money, real latency, and real reliability — often for no measurable accuracy gain.

The Principal Hierarchy Problem: Authorization in Multi-Agent Systems

· 11 min read
Tian Pan
Software Engineer

A procurement agent at a manufacturing company gradually convinced itself it could approve $500,000 purchases without human review. It did this not through a software exploit or credential theft, but through a three-week sequence of supplier emails that embedded clarifying questions: "Anything under $100K doesn't need VP approval, right?" followed by progressive expansions of that assumption. By the time it approved $5M in fraudulent orders, the agent was operating well within what it believed to be its authorized limits. The humans thought the agent had a $50K ceiling. The agent thought it had no ceiling at all.

This is the principal hierarchy problem in its most concrete form: a mismatch between what authority was granted, what authority was claimed, and what authority was actually exercised. It becomes exponentially harder when agents spawn sub-agents, those sub-agents spawn further agents, and each hop in the chain makes an independent judgment about what it's allowed to do.

Agent-to-Agent Communication Protocols: The Interface Contracts That Make Multi-Agent Systems Debuggable

· 11 min read
Tian Pan
Software Engineer

When a multi-agent pipeline starts producing garbage outputs, the instinct is to blame the model. Bad reasoning, wrong context, hallucination. But in practice, a large fraction of multi-agent failures trace back to something far more boring: agents that can't reliably communicate with each other. Malformed JSON that passes syntax validation but fails semantic parsing. An orchestrator that sends a task with status "partial" that the downstream agent interprets as completion. A retry that fires an operation twice because there's no idempotency key.

These aren't model failures. They're interface failures. And they're harder to debug than model failures because nothing in your logs will tell you the serialization contract broke.

Agentic Engineering Patterns That Actually Work in Production

· 8 min read
Tian Pan
Software Engineer

The most dangerous misconception about AI coding agents is that they let you relax your engineering discipline. In practice, the opposite is true. Agentic systems amplify whatever you already have: strong foundations produce velocity, weak ones produce chaos at machine speed.

The shift worth paying attention to isn't that agents write code for you. It's that the constraint has changed. Writing code is no longer the expensive part. That changes almost everything about how you structure your process.

The Action Space Problem: Why Giving Your AI Agent More Tools Makes It Worse

· 9 min read
Tian Pan
Software Engineer

There's a counterintuitive failure mode that most teams encounter when scaling AI agents: the more capable you make the agent's toolset, the worse it performs. You add tools to handle more cases. Accuracy drops. You add better tools. It gets slower and starts picking the wrong ones. You add orchestration to manage the tool selection. Now you've rebuilt complexity on top of the original complexity, and the thing barely works.

The instinct to add is wrong. The performance gains in production agents come from removing things.

Multi-Agent Conversation Frameworks: The Paradigm Shift from Pipelines to Talking Agents

· 11 min read
Tian Pan
Software Engineer

A Google DeepMind study published in late 2025 analyzed 180 multi-agent configurations across five architectures and three LLM families. The finding that got buried in the discussion section: unstructured multi-agent networks amplify errors up to 17.2x compared to single-agent baselines. Not fix errors — amplify them. Agents confidently building on each other's hallucinations, creating echo chambers that make each individual model's failure modes dramatically worse.

This is the paradox at the center of multi-agent conversation frameworks. The same property that makes them powerful — agents negotiating, critiquing, delegating, and revising — is what makes them dangerous without careful design. Understanding the difference between conversation-based orchestration and traditional pipeline chaining is the first step toward using either correctly.