Skip to main content

Agent Memory Schema Evolution Is Protobuf on Hard Mode

· 11 min read
Tian Pan
Software Engineer

The first painful agent-memory migration always teaches the same lesson: there were two schemas, and you only migrated one of them. The storage layer is fine — every row was rewritten, every key is in its new shape, the backfill job logged success. The agent is broken anyway. It keeps writing to user.preferences.theme, retrieves nothing, then helpfully synthesizes a default from context as if the key never existed. The migration runbook reports green. Users report stale memory.

The asymmetry is structural. A traditional service that depends on a renamed column gets a hard error and you fix it. An agent that depends on a renamed memory key gets a soft miss and confabulates around it. The schema lives in two places — your store and the model's context — and you can only migrate one of them with a SQL script.

Protobuf solved a version of this problem twenty years ago by codifying an additive-only discipline: fields are forever, numbers are forever, wire types never change, and removal is replaced with deprecation. That discipline is the right starting point for agent memory, with one extra constraint that makes it harder. Protobuf receivers ignore unknown fields by design. Agents don't.

Why "the memory store has a schema" is the insight you acquire too late

Most teams stand up agent memory the same way: a key-value store, a vector index for semantic recall, a thin SDK that lets the model write and read with whatever keys it likes. There's no DDL, no migration table, no versioned schema file checked into the repo. The model and the application code converge on a working set of keys — user.profile.name, tasks.open[*].deadline, meeting_notes.2026Q1 — and those keys harden into an implicit contract over weeks of production traffic.

This is a schema. It just isn't written down anywhere a code review can catch.

The problem surfaces the first time someone tries to clean it up. A developer notices that user.profile.name and user.full_name are storing the same data, picks one, runs a backfill. The store now has consistent keys. The agent does not. It still writes to both — sometimes the old name, sometimes the new — because months of in-context history have shown it both keys, and few-shot examples in the system prompt show it the old one. Worse, retrieval against the old key now misses, because the data lives under the new name. From the user's perspective, the agent suddenly forgot half of what it knew.

The lesson teams arrive at, painfully, is that the keys an agent uses are not implementation details of the storage layer. They are part of the model's prompt, reinforced by every retrieved memory, every few-shot example, every conversation summary that gets piped back into context. Migrating the store without migrating the prompts is the same kind of mistake as migrating a database column without migrating the application that queries it — except the application is a probability distribution and you can't grep it.

Protobuf rules, ported to memory

Protobuf's rules for safe schema evolution boil down to a small set of invariants: don't change field numbers, don't change wire types, don't reuse retired numbers, never make a field required, and prefer deprecation to deletion. The underlying principle is that the binary encoding contract is immutable; you only get to add things that older readers can ignore.

Port that to agent memory and the rules look like this:

  • Keys are forever. Once an agent has written to user.preferences.theme, that key path is reserved. You don't get to reuse it for a different field, even after you stop writing to it.
  • Types are forever at a key. If user.preferences.theme was a string, it stays a string. Changing it to a structured object breaks every retrieval that returns the old shape into context, because the model will pattern-match on the wrong shape.
  • Add, don't mutate. A new preference shape lives at a new key (user.preferences.theme_v2 or user.preferences.appearance). The old key continues to exist, possibly populated by a translation layer.
  • Deprecation is a state, not an event. A deprecated key still resolves on read for as long as there are any in-context examples or summaries that reference it. That window is months, not days.
  • Removal requires evidence. Before you actually delete a key, you need traces showing the model has stopped referencing it across all live sessions and all summary regenerations.

The discipline rhymes with protobuf, but the constraint is harder because the receiver — the model — can't be patched in place. With protobuf, you ship a new generated client and the old field handler is gone. With an agent, every conversation that loaded the old key into context is still out there, and tomorrow's session may load a summary that references it. You're migrating against an audience that has memorized the old API.

The reinforcement vector nobody draws on the architecture diagram

When teams diagram their agent memory system, they draw a box for the store, an arrow for the write path, an arrow for the retrieval path, and a box for the model. What's missing from the diagram is every other place the schema lives.

The schema is also in:

  • The system prompt's few-shot examples, which often hard-code key names to demonstrate the read/write API.
  • In-context conversation history, which reproduces past tool calls and their arguments verbatim.
  • Summary memory, where prior interactions get compressed into prose that mentions key names by name.
  • Reflection or self-improvement loops, which generate plans referencing the keys the agent expects to find.
  • Tool descriptions, where memory operations are documented with example payloads.

Each of these is an independent copy of the schema. Each of them reinforces the model's belief about what keys exist and what shapes they hold. Migrate the store and leave any of these untouched, and the model continues to operate against the old contract — sometimes successfully (because the old key still resolves), sometimes silently failing (because it doesn't), almost never loudly.

This is the structural reason agent-memory migrations are protobuf on hard mode. With protobuf, a new schema means a new build. With an agent, a new schema means you need to find every surface where the old schema is reinforced in the model's context window and update or expire it. There is no compiler that will tell you which surfaces you missed.

The shadow-write, behavioral-dual-run migration playbook

The shadow-write pattern from database migrations transfers directly, with one critical addition. The standard playbook for a zero-downtime DB migration is roughly: add the new column alongside the old, dual-write to both, backfill the new from the old, switch reads to the new, dual-read for a verification window, then drop the old. Vector-database migrations follow the same shape with a embedding_v2 column populated in the background while live reads continue to hit the original.

For agent memory, that playbook works for the storage layer, but the migration isn't done when reads cut over. You need a behavioral dual-run on top of it.

A behavioral dual-run looks like this. After the storage layer is dual-writing both shapes, you replay a representative sample of recent agent traces against two configurations: the old configuration, with prompts and few-shot examples that reference old keys, and the new configuration, with prompts updated to reference new keys. You compare not just the final answer but the intermediate tool calls, the keys the agent attempted to read, and the keys it attempted to write. Divergences in those intermediate steps are your migration's missing pieces — the reinforcement surfaces that still teach the model the old schema.

You don't get to skip this step. Trace replay is the only way to surface what the model actually does, because the model's behavior is conditioned on the full context window, and you can't statically prove that some downstream summary won't reintroduce the old key into a future prompt. Replays make the implicit contract executable.

The cutover discipline that matches:

  • Phase 1 (additive): dual-write to both old and new keys, leave reads pointed at old, leave prompts unchanged. No behavior change. Verify backfill correctness.
  • Phase 2 (shadow read): route a small percentage of reads through the new key, log retrieval mismatches against the old key. Tune until mismatches are zero or explained.
  • Phase 3 (prompt cutover): update few-shot examples and tool descriptions to reference new keys. Replay representative traces. Compare tool-call sequences against baseline. Iterate until divergence is bounded.
  • Phase 4 (summary regeneration): regenerate any cached summaries or reflections that reference old keys. This is the step most teams skip and then spend a quarter wondering why the agent occasionally "remembers" the old schema.
  • Phase 5 (deprecation watch): keep the old key resolving for a verification window long enough to cover the longest live session and the longest summary-regeneration cadence. Only after that window is the key safe to retire — and even then, the key number, in the protobuf sense, stays reserved forever.

What this changes in how you build memory in the first place

The teams that have done one of these migrations build the second-generation memory layer differently. The patterns that emerge are conservative in the same way protobuf is conservative.

Treat memory keys as a versioned, reviewed schema artifact. Check in a memory_schema.yml next to your prompts. Pull requests that introduce a new key get a code review the way a new database column does, not the way a new variable name does. The schema file is the single source of truth that few-shot examples, tool descriptions, and validation layers all read from.

Namespace keys deliberately. LangGraph's pattern of (namespace, key) tuples — for example, ("memories", user_id) — pays for itself the first time you need to migrate, because you can scope the migration to a namespace without touching the rest. Flat key spaces force every migration to be global.

Add fields, never mutate them. The day you find yourself wanting to change the type of a key, accept that you're actually adding a new key and translating between the two for a long deprecation window. The translation layer is cheap. The rewrite is not.

Build trace replay into the deployment pipeline before you need it. The first time you try to do a behavioral dual-run during an incident, you'll discover that you have no way to replay traces deterministically because tool outputs weren't captured, or because the prompts weren't versioned alongside the traces, or because session IDs weren't propagated. Fix that during a calm week, not during the migration.

Instrument retrieval misses as a first-class signal. A miss isn't an absence of data — it's evidence that some surface in the system still references a key that no longer resolves. Pipe miss-counts per key into the same dashboard that tracks model latency and tool errors. If a deprecated key starts seeing reads after you thought you'd killed it, something — a long-lived summary, a stale few-shot — is reintroducing it.

The takeaway

Agent memory looks like a key-value store with a model on top. It behaves like a distributed system where one of the participants is a probability distribution that has memorized your old API. Schema evolution under those conditions is a discipline, not a feature of your storage layer.

The protobuf playbook — additive only, numbers forever, deprecation over deletion — is the right starting point because it was designed for exactly the constraint agent memory imposes: you can't atomically update every reader. The extra constraint is that one of your readers is reinforced by every prior conversation, every few-shot example, every summary you ever cached. That reader can't be patched. It can only be retrained, in context, by changing every surface that teaches it the schema.

The cheapest version of this lesson is to write the schema down on day one and treat it as a contract. The expensive version is the migration that surfaces it for you.

References:Let's stay in touch and Follow me for more thoughts and updates