How AI Agents Actually Learn Over Time

September 20, 2025 · 8 min read

Software Engineer

Most teams building AI agents treat the model as a fixed artifact. You pick a foundation model, write your prompts, wire up some tools, and ship. If the agent starts making mistakes, you tweak the system prompt or switch to a newer model. Learning, in this framing, happens upstream—at the AI lab, during pretraining and RLHF—not in your stack.

This is the wrong mental model. Agents that improve over time do so at three distinct architectural layers, and only one of them involves touching model weights. Teams that understand this distinction build systems that compound in quality; teams that don't keep manually patching the same failure modes.

The Three Layers of Agent Learning

Think of a deployed agent as a stack:

Model layer: the neural network and its weights
Harness layer: the code, instructions, tools, and orchestration logic that run the agent
Context layer: the external configuration—memory, per-user instructions, dynamic tool sets—injected at runtime

Learning can happen at each layer independently. Most engineering effort focuses on the model layer by default, but in practice, the harness and context layers are often faster, safer, and more targeted.

Layer 1: Model Weights

This is traditional machine learning: supervised fine-tuning (SFT) on curated trajectories, reinforcement learning from human or AI feedback, or distillation from a stronger model. It's powerful when done right, but comes with real costs.

The central problem is catastrophic forgetting: updating weights for new tasks degrades performance on old ones. A model fine-tuned on your customer support domain might get worse at reasoning tasks it previously handled well. Techniques like Elastic Weight Consolidation (EWC) penalize large changes to weights that were important for earlier tasks, and Progressive Neural Networks sidestep forgetting entirely by adding new capacity for each task while freezing old weights. Recent work from Meta FAIR uses sparse memory fine-tuning—activating only a tiny subset of memory slots per forward pass—to isolate new knowledge from existing representations.

The practical upshot: model-layer learning is high-leverage but high-risk. It requires careful evaluation, controlled rollout, and significant data infrastructure. Most teams should reach for it after exhausting the other two layers.

Layer 2: The Harness

The harness is everything between the model and the user: system prompts, tool definitions, retry logic, output parsers, routing code. This layer is entirely under your control and doesn't require retraining anything.

A pattern called the meta-harness makes this explicit: a separate agent—often a stronger model—analyzes production execution traces and proposes improvements to the harness itself. It might notice that a tool is being called with the wrong parameters 30% of the time and suggest a clearer description. Or it identifies a prompt pattern that reliably leads to hallucinations and rewrites it. The humans approve the changes; the loop runs again.

This is slower than context updates but faster than model retraining. And unlike weight changes, it's fully auditable—a diff in your repo. When Claude Code analyzes a failed task and suggests an update to CLAUDE.md, that's harness-layer learning made tangible.

Layer 3: Context

Context learning is the fastest and most targeted: you update what gets injected into the prompt at runtime without changing the model or the code. This includes:

Per-user memory: facts, preferences, and history retrieved from a memory store before each call
Per-tenant configuration: different tool sets, instructions, or personas for different organizations
Dynamically updated skills: a library of few-shot examples that gets refined over time based on what worked

One pattern worth naming is offline dreaming: periodically processing a batch of recent execution traces to extract generalizable insights, then writing those insights back into the context store. An agent that handles scheduling might notice, after 500 runs, that users in a particular timezone consistently override its default meeting times—and update its working memory accordingly. No model update required.

The tradeoff is that context learning is bounded by context window size and retrieval quality. You can't encode everything into a prompt. This is why the three layers are complementary rather than alternatives.

The Foundational Primitive: Traces

None of this works without traces. A trace is a complete record of an agent's execution: the input, every tool call and its result, intermediate reasoning steps, and the final output. Traces are the training data for SFT, the input to the meta-harness, and the raw material for offline dreaming.

Teams that invest early in trace infrastructure tend to pull ahead over time. The compound effect is real: each production run becomes potential signal for improvement. Teams that treat their agents as black boxes—logging only inputs and outputs—are throwing away most of this signal.

A useful frame: think of your trace store as a flywheel. The more you log, the more you can improve. The more you improve, the better your traces get (because better agents encounter more interesting edge cases). The loop accelerates if you build the infrastructure to close it.

What goes into a good trace store:

Full tool call history with latencies and error codes
A human or AI judgment of whether the run succeeded
Structured metadata (user, tenant, task type, model version)
Intermediate reasoning steps, not just final outputs

Choosing When to Apply Each Layer

The right layer depends on what you're optimizing and how fast you need results.

Use context updates when:

The failure mode is specific to a user or tenant
You have a clear, targeted fix (a new memory entry, a corrected instruction)
You need to iterate quickly in production

Use harness updates when:

A failure pattern shows up across many users
The root cause is a poorly specified tool description, ambiguous instruction, or missing error handler
You want the fix to be code-reviewable and version-controlled

Use model-layer updates when:

You have a domain that requires persistent new knowledge or style (e.g., a new product line, a specialized professional vocabulary)
Context and harness fixes have hit their ceiling
You have enough labeled trajectory data and evaluation infrastructure to validate the update safely

In practice, most teams should sequence these: fix the context first, then the harness, then consider fine-tuning. The cost and risk profile increases at each layer, so validate that lower layers can't solve the problem before reaching for the next.

Production Patterns

A few patterns that show up repeatedly in teams doing this well:

Granularity matters. Not all learning should be global. A fix that works for one user might be wrong for another. Build your learning infrastructure to operate at the right scope: per-user for preference learning, per-tenant for domain-specific behavior, global for systematic bugs.

Offline before online. Real-time learning—updating context or memory during a live task—is tempting but fragile. A model that updates its own context mid-task can compound errors. Start with offline batch processing of traces; only move to online learning once you have confidence in your update logic.

Evaluation is non-negotiable. Every learning update—whether it's a new memory entry, a revised system prompt, or a fine-tuned model—needs an eval before it goes to production. Even small changes can have surprising downstream effects. The teams that move fast here are the ones with good evals, not the ones skipping them.

Human oversight at the harness layer. Automated meta-harness suggestions are useful, but a human should approve prompt and tool changes before they ship. Context updates (like adding a memory entry) can often be more automated; harness changes should go through code review.

Where This Is Heading

The research frontier is pushing toward agents that can propose and evaluate their own improvements across all three layers—not just suggest a memory entry, but reason about whether a model update is warranted and what data would be needed to validate it. Systems like this are starting to appear in research settings.

For practitioners today, the opportunity is more modest but still substantial: instrument your traces, build a feedback loop at the context layer, and give your harness a regular review informed by production data. Most deployed agents are running on static prompts and fixed tool definitions written during the initial build. The gap between those agents and ones that compound in quality over time is mostly an infrastructure and process gap, not a model capability gap.

The teams that close that gap now will have an increasingly durable advantage as the underlying models improve and the cost of all three learning modalities continues to fall.

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

How AI Agents Actually Learn Over Time

The Three Layers of Agent Learning

Layer 1: Model Weights

Layer 2: The Harness

Layer 3: Context

The Foundational Primitive: Traces

Choosing When to Apply Each Layer

Production Patterns

Where This Is Heading

Recommended Reading

About Tian Pan

The Three Layers of Agent Learning​

Layer 1: Model Weights​

Layer 2: The Harness​

Layer 3: Context​

The Foundational Primitive: Traces​

Choosing When to Apply Each Layer​

Production Patterns​

Where This Is Heading​

Recommended Reading

About Tian Pan

The Three Layers of Agent Learning

Layer 1: Model Weights

Layer 2: The Harness

Layer 3: Context

The Foundational Primitive: Traces

Choosing When to Apply Each Layer

Production Patterns

Where This Is Heading