Skip to main content

How AI Agents Actually Work: Architecture, Planning, and Failure Modes

· 10 min read
Tian Pan
Software Engineer

Most agent failures are architecture failures. The model gets blamed when a task goes sideways, but nine times out of ten, the real problem is that nobody thought hard enough about how planning, tool use, and reflection should fit together. You can swap in a better model and still get the same crashes — because the scaffolding around the model was never designed to handle what the model was being asked to do.

This post is a practical guide to how agents actually work under the hood: what the core components are, where plans go wrong, how reflection loops help (and when they hurt), and what multi-agent systems look like when you're building them for production rather than demos.

What an Agent Actually Is

Strip away the hype and the definition is almost boringly simple. An agent is a system that perceives its environment and takes actions through tools. It's defined by two things: the environment it operates in (a file system, a browser, a database, the internet) and the set of actions it can perform (read a file, run a query, send an API request, write code).

The key distinction from a plain LLM call is the loop. An agent doesn't answer once and stop — it cycles through perceive, plan, act, observe, and repeat until the task is done (or until it fails in some interesting way). That loop is what makes agents capable of multi-step tasks, and it's also what makes them hard to build correctly.

What most people call "an agent" in 2025-2026 is more precisely a foundation model agent: a system where an LLM acts as the planning and decision-making core, surrounded by infrastructure that gives it tools, memory, and a way to track progress. This is different from reinforcement learning agents that learn through reward signals — though the two approaches are converging.

The Three Core Capabilities

Every useful agent needs to get three things right: tool use, planning, and reflection. Get any one of them wrong and the whole system degrades.

Tool Use

Tools are how agents interact with the world beyond the model's own knowledge. They fall into two broad categories:

Knowledge augmentation tools pull in information the model doesn't have: web search, SQL queries, document retrieval, internal APIs. They address the fundamental problem that any model's training data has a cutoff date and doesn't include your proprietary data.

Capability extension tools compensate for things models do poorly: calculators for arithmetic, code interpreters for complex data analysis, timezone converters for scheduling math. Research on tool-augmented models consistently shows meaningful gains — combining retrieval with calculation with code execution creates a system that's far more capable than any single tool or the base model alone.

Write-action tools are where it gets serious: sending emails, modifying databases, executing financial transactions. These enable genuine end-to-end automation, but they also mean mistakes are no longer hypothetical. A poorly-formed SQL query that reads data is harmless. One that writes is not.

The practical challenge isn't adding tools — it's knowing which ones to include. More tools means more capability and more complexity. A larger tool inventory increases the chance the model picks the wrong one, generates bad parameters, or tries to call a tool that doesn't exist. The right approach is to start minimal and add tools based on where the agent actually fails, not based on what seems useful in theory.

Planning

Planning is where most of the interesting (and painful) engineering happens. An agent has to translate a high-level goal into a sequence of concrete tool calls, while accounting for dependencies, conditionals, and the possibility that earlier steps might fail.

The basic patterns for control flow are:

  • Sequential: Action B follows A, always
  • Parallel: A and B execute simultaneously, results merge
  • Conditional (routing): Choose B or C based on the outcome of A
  • Iterative: Repeat until a condition is met

Most real tasks combine all four. A research workflow might run several web searches in parallel, route to different summarization strategies depending on what it finds, iterate until it has enough information, and then sequentially produce output sections.

There's an ongoing debate about whether autoregressive LLMs can truly plan in the computational sense, or whether they're doing something that resembles planning without the underlying properties we'd want. The practical answer is: it doesn't matter much. What matters is that they fail in predictable ways, and you can engineer around those failures.

One of the most useful structural decisions is decoupling plan generation from plan execution. Generate a plan, validate it against basic heuristics (does it use real tools? does it have too many steps? does it violate constraints?), and only then execute. This prevents you from burning API costs on plans that were doomed from the start.

Reflection

Reflection is the mechanism by which an agent learns from what just happened. After taking an action and observing the result, the agent (or a separate evaluator component) decides whether to continue, backtrack, or try a different approach.

The ReAct pattern — interleaving Thought, Action, and Observation in a cycle — is the most common implementation. The agent explicitly reasons about what it did before deciding what to do next. This simple loop catches a surprising number of failure modes that would otherwise silently compound.

A more sophisticated approach separates the reflection function from the actor. A dedicated evaluator scores the outcome quality; a self-reflection module analyzes why failures occurred; the actor uses those insights to generate a better plan. This is the Reflexion pattern, and it's particularly useful for tasks with clear success criteria where failures are instructive rather than terminal.

The cost of reflection is real. More reasoning tokens, more latency, more API calls. For simple tasks, it's wasteful. For complex tasks where getting it right matters, it pays for itself. Calibrating when to reflect — and how deeply — is one of the more underappreciated engineering decisions in agent design.

How Plans Actually Fail

When an agent doesn't complete a task correctly, the failure usually falls into one of five categories:

  1. Invalid tool selection — The agent calls a tool that doesn't exist. This is hallucination at the tool level: the model invents a plausible-sounding function name that isn't in its inventory.

  2. Valid tool, invalid parameters — Right tool, wrong number of arguments or wrong types. A function that expects two strings gets three integers.

  3. Valid tool, incorrect values — Structurally correct call, semantically wrong. The right query structure with the wrong date range. The right API endpoint with a missing authentication header.

  4. Goal failure — The plan technically executes without errors, but achieves the wrong thing. The agent interpreted the task differently than intended, satisfied the literal request while missing the actual objective.

  5. Reflection errors — The agent (incorrectly) believes it's done when it isn't, or vice versa. The termination condition is wrong.

Categories 1-3 are caught by instrumentation and schema validation. Categories 4-5 require better evaluation, either by humans reviewing outputs or by an LLM-as-judge setup that checks whether outcomes match the stated goals.

The practical implication: always print every tool call and its output during development. Pattern analysis across many runs tells you which tools are causing problems, what parameter mistakes are most common, and whether the issue is in prompting, in tool design, or in the underlying model.

Multi-Agent Systems

Here's something that surprises people the first time they encounter it: almost any non-trivial agent is already a multi-agent system, even if it doesn't look like one. The plan generator is effectively a different agent than the plan executor. Add an evaluator and a reflection module and you have four loosely-coupled components that each have their own behavior and failure modes.

The field is going through something analogous to the microservices transition in software: single monolithic agents giving way to orchestrated teams of specialized sub-agents. A coordinator agent breaks down a high-level task; specialist agents handle web research, code generation, document synthesis, and output formatting; a validation agent checks the final result.

This architecture has real advantages:

  • Each agent can be prompted and tuned for its specific function
  • Expensive frontier models can be reserved for planning; cheaper models execute
  • Failures are localized and easier to diagnose
  • Human oversight can be injected at specific handoff points

The Plan-and-Execute pattern is one concrete implementation. A capable model generates a full strategy; a cheaper model executes each step. The cost savings can be substantial — 90% reduction in API costs in some workloads — because most of the reasoning load falls on the planning phase.

Human-in-the-loop integration fits naturally into multi-agent architectures. Humans can participate at any stage: validating a generated plan before execution, approving risky write operations, or being handed off to for tasks the agent correctly identifies as beyond its authority. The key is making those handoff points explicit rather than letting the agent silently make decisions it shouldn't.

Engineering for Reliability

The gap between a demo that works and a production agent that works reliably is almost entirely a function of engineering discipline, not model quality.

A few principles that hold up in practice:

Separate concerns explicitly. Don't let your planning logic, tool dispatch, and evaluation logic blur together in one monolithic prompt. Separate components with clear interfaces are easier to debug, easier to upgrade, and easier to test in isolation.

Validate plans before executing them. Heuristic checks — step count limits, tool validity, constraint verification — are fast and catch obvious failures before they waste computation. AI-based plan evaluation is more thorough but more expensive; reserve it for complex tasks.

Version your tool interfaces. Tool descriptions in the system prompt are effectively an API. Changes to how you describe a tool change how the model uses it. Treat tool interface design with the same care you'd give to a public API — clear semantics, good documentation, stable contracts.

Build for observability from the start. Every tool call should be logged. Every reflection decision should be logged. When your agent fails in production, you need enough history to reconstruct what it was thinking. This isn't optional instrumentation — it's what makes the system improvable.

Match autonomy to risk. Write actions deserve more human oversight than read actions. Irreversible operations deserve more confirmation than reversible ones. The right level of autonomy isn't fixed — it's a function of what the agent is doing and what the consequences of mistakes are.

The IMPACT framework from the 2025 AI Engineer Summit captures the essential components cleanly: Intent, Memory, Planning, Authority, Control Flow, Tools. Work through each dimension deliberately when designing an agent system. Skipping any of them is how you end up with a system that works in demos and fails at 2am on production traffic.

The Real Work

What makes agent development genuinely hard isn't the model capabilities — those have improved faster than anyone expected. It's the engineering discipline of building systems that fail gracefully, recover from errors, know their own limits, and can be meaningfully improved after they're deployed.

The good news is that these are solvable problems. The patterns are emerging, the tooling is maturing, and teams that built production agents in 2025 have hard-won intuitions to share. The bad news is that there are no shortcuts: agents are distributed systems with LLMs in the loop, and distributed systems require the same rigor they always have.

Start with the smallest agent that could plausibly work. Add tools based on observed failures. Instrument everything. Introduce reflection where failures are recoverable and costly. Bring in multi-agent coordination when single-agent complexity becomes unmanageable. That's not a novel insight — it's just good engineering applied to a new problem class.

Let's stay in touch and Follow me for more thoughts and updates