Skip to main content

The Backfill Problem: Why Agent Memory Needs Migrations Like a Database

· 11 min read
Tian Pan
Software Engineer

You shipped a better memory format on a Tuesday. The new schema splits a freeform summary string into structured fields — entities, preferences, last_verified_at — because the old blob was hard to retrieve against and impossible to update cleanly. The change is obviously correct. It passes review. It ships.

What you did not notice is that every memory written before Tuesday is now subtly wrong. Some records still have the old summary field and no entities, so the retrieval code that now keys on entities skips them. A few have a summary that the new parser interprets as an empty preference set. The agent didn't crash. It just quietly forgot a year of accumulated context, and nobody filed a bug because nothing looked broken — the agent still answered, just worse.

This is the backfill problem, and it is the predictable consequence of treating agent memory as a feature instead of as a database. Memory stores accumulate records continuously, the records have structure, and that structure drifts every time someone improves it. A relational database has decades of tooling and hard-won discipline around exactly this — migrations, versioned schemas, backfill jobs, dual-write windows. Agent memory has a JSON document and an append call. The gap between those two is where your agent's history goes to die.

Memory Is a Database Nobody Put Under Schema Control

Walk through how a memory store actually gets built. Version one is a list of strings appended to a per-user document. Version two adds a timestamp because recency matters at retrieval. Version three splits the string into a typed record because the agent needs to update individual facts, not rewrite the whole blob. Version four adds a confidence score, then a source field, then a superseded_by pointer once you realize memories contradict each other. Each step is a small, sensible improvement shipped by whoever was closest to the code.

At no point did anyone write a migration. The store is append-only, so old records simply stay in their old shape. After a year you have a single collection holding four or five distinct record formats, distinguished only by which fields happen to be present. The schema is not defined anywhere — it is the union of every shape the writer code has ever emitted, and the only system that "knows" the schema is the reader, which is now a thicket of if "entities" in record checks that nobody fully understands.

A real database forces the question. You cannot add a non-nullable column to a populated table without deciding what happens to existing rows; the migration tool makes you say it out loud. Agent memory has no such forcing function. A document store accepts whatever you write, retrieval tolerates missing fields by returning None, and the LLM downstream is agreeable enough to produce a plausible answer from partial input. Every layer in the stack is individually forgiving, and the sum of that forgiveness is a store whose contents you can no longer reason about.

The framing that fixes this is boring and correct: your memory store is a production database. It has a schema. The schema has versions. Changing it is a migration, and a migration that ignores existing data is a bug, not a simplification.

Three Ways a Schema Change Quietly Corrupts the Past

When old records meet new code, the damage takes three forms, and they get progressively harder to detect.

Unreadable is the lucky case. The new reader expects a field that old records don't have, hits a KeyError or a null, and either throws or skips the record. This is the failure you want, because it is loud. A skipped record shows up as a coverage gap; an exception shows up in your error tracker. You lose the memory, but you find out.

Silently misinterpreted is worse. The old and new schemas overlap enough that the new reader accepts an old record without complaint but reads it wrong. The classic version is a units or enum change: the old record stored duration in seconds, the new one in milliseconds, and a pre-migration memory now claims an operation took a thousand times too long. Or you renamed an enum value and the old string falls through to a default. The agent ingests a confident, well-formed, completely wrong fact and reasons from it. Nothing logs an error because, structurally, nothing is wrong.

Quietly dropped is the one that bit the Tuesday deploy. The record is technically readable, but a filter — a retrieval query, a relevance threshold, a "only memories with embeddings from the current model" guard — excludes it. The memory still exists in the store. It just never surfaces. Your agent's effective memory shrinks while your storage bill says everything is fine, and the only symptom is a slow degradation in answer quality that is almost impossible to attribute to a schema change three weeks earlier.

The throughline: the LLM at the end of the pipeline will paper over all three. It produces fluent output from whatever context it receives, so a corrupted or truncated memory set does not announce itself the way a corrupted database row crashes a report. You need to catch these before the model does, because the model will not.

Version the Record, Not Just the Code

The first concrete fix costs almost nothing and you should have done it on day one: put a schema_version integer on every memory record. Not a timestamp you infer the schema from, not a guess based on which fields are present — an explicit, written-down integer that says "this record was produced by writer version N."

This single field changes the problem from archaeology to dispatch. Your reader stops sniffing for the presence of entities and starts branching on a known version number. You can write a real migrate(record) function — a chain of small upgrade steps, v1→v2→v3, each one a pure transformation you can unit-test against fixtures of real old records. When a record comes in at version 2, you run it through steps 2→3 and 3→4 and hand the reader a guaranteed-current shape. The reader only ever sees the latest version, which means the if field in record thicket collapses into one clean code path.

It also makes the unmigratable cases visible. Some v1→v2 steps are lossless and trivial — rename a field, default a new one. Others are not: if v2 split a freeform string into structured entities, there is no pure function that recovers the entities, because the information was never captured. Versioning forces you to confront that at migration-design time rather than discovering it in production. You either accept the loss explicitly, or you spend an LLM call to re-extract structure from the old blob and mark the result lower-confidence. Both are fine. Doing neither, by accident, is not.

Borrow the rest of the discipline from database practice. The expand-contract pattern — also called parallel change — is the safe shape for any breaking change to a live store. Expand: add the new fields alongside the old ones and have the writer populate both. Migrate: backfill historical records into the new shape. Contract: once everything reads and writes the new fields, drop the old ones. At each phase old and new code coexist, and you can roll back. The version field is what makes every phase observable: you can query exactly how many records remain at each version and watch the migration converge.

Lazy Migrate on Read Versus Bulk Backfill

Once you have versioned records and a migration function, you face the one genuine architectural decision in this whole problem: when does the migration actually run?

Lazy migrate on read runs migrate() when a record is fetched, and optionally writes the upgraded record back. It ships instantly — no batch job, no maintenance window — and it never spends compute on records nobody reads, which matters when most memory is cold and never retrieved again. The costs are real, though. Every read carries migration overhead until the store converges. Old records that are never read are never upgraded, so you must keep every historical migration step alive forever — you cannot ever delete the v1→v2 code, because a v1 record might surface five years later. And analytics that scan the raw store still see the old shapes, because nothing migrated them.

Bulk backfill runs a batch job that walks the entire store and rewrites every record to the current version. Afterward the store is uniform: reads are clean, old migration code can be retired, analytics are consistent. The price is operational. It is a migration job over a production dataset, with all the usual care — batch in chunks of a few thousand records, sleep between batches so you don't saturate I/O, make it resumable so a failure halfway through doesn't restart from zero, and handle records being written concurrently while the job runs.

The pattern that actually works in production is both, in sequence. Ship lazy-migrate-on-read first so new code is correct from the moment it deploys and no record is ever read in a stale shape. Then run the bulk backfill in the background to converge the long tail of cold records. When the backfill reports zero records below the current version, you can delete the oldest migration steps and simplify. Lazy migration buys you correctness immediately; bulk backfill buys you the right to clean up later. Skipping the backfill is the common mistake — the system works, so nobody funds the job, and you carry a decade of accumulating migration code as permanent interest on a debt you chose not to pay down.

The Migrations You Cannot Write

Two cases deserve a warning, because they look like schema problems and are not fully solvable as ones.

The first is semantic drift. You change not the shape of a field but its meaning — priority used to be assigned by a heuristic and now comes from a model; category used to mean one thing to the extraction prompt and means something narrower after a prompt rewrite. The records are structurally identical before and after. No migration function can tell them apart, because the difference is not in the data, it is in the process that produced it. The only defense is to treat a meaning change as a version bump too — stamp records with the prompt or extractor version that created them, so a future reader can at least know which semantics it is looking at, even if it cannot convert between them.

The second is the embedding layer. If your memory store uses vector retrieval and you upgrade the embedding model, every pre-upgrade vector lives in a different coordinate space and is not comparable to the new ones. There is no transformation that migrates an old vector into the new space — the only real fix is to re-embed the source text. That is a backfill job of a different kind, gated on still having the original text stored alongside the vector. If you discarded the source text to save space, those memories are not migratable at all; they are simply lost the day you change models. Keep the source text. It is the one thing you genuinely cannot regenerate.

Treat Memory Like the Database It Already Is

The uncomfortable summary is that "agent memory" is a persistent, schema-bearing, continuously-growing data store, and the only reason it does not get database-grade discipline is that it arrived through the AI feature door instead of the data-infrastructure door. Nothing about the engineering is novel. Versioned records, an upgrade chain, expand-contract, lazy reads plus a bulk backfill, keeping source data so you can recompute derived data — this is standard practice that the data world settled years ago.

Three things to do before your next memory schema change. Add a schema_version field now, even if every record is currently version 1, because you cannot retroactively version records you have already written blind. Write the schema change as an explicit, tested migration function rather than a defensive if in the reader. And keep the raw source text for anything derived — embeddings, extracted entities, summaries — because the derived form will go stale the day you improve the deriving process, and source text is the only thing that lets you rebuild it. The day you ship a better memory format should be the day your agent remembers more, not the day it quietly forgets everything that came before.

References:Let's stay in touch and Follow me for more thoughts and updates