Per-Vector Version Tags: The Missing Column Behind Every Embedding Migration
A new embedding model lands. The benchmark numbers are 4% better. A staff engineer files the ticket: "Upgrade embeddings to v3." Two weeks later the index has been re-embedded, the alias has been swapped, and the team has shipped the change behind a feature flag. Six weeks later, support tickets pile up. Search results "feel off." A retro is scheduled. Nobody can explain what regressed because nothing crashed and every dashboard is green.
The problem is not the model swap. The problem is that the vector store has no idea which vectors came from which model. There is no column for it. There is no migration table tracking which records have been backfilled. There is no alembic_version row, no schema_migrations table, no pg_dump of the previous state. The team treated an embedding upgrade like a config flip, and the vector store had no schema-level concept that would have stopped them.
Embedding migrations need the same artifact that database migrations have relied on for two decades: a per-record version tag, written into every vector, queried on every read, and used as the gating criterion for cutover and rollback. It is the single column most teams forget to add, and adding it later costs more than adding it up front.
What Postgres Taught Us That Vector Stores Forgot
Relational databases got migration tooling early. ALTER TABLE is online for most changes. Tools like pg_dump, gh-ost, pt-online-schema-change, Liquibase, Flyway, and Alembic give engineers versioned, repeatable, rollback-aware schema changes. Every migration carries a version number. Every row knows which schema it conforms to. The toolchain assumes you will mess up at least once and need to back out gracefully.
Vector stores ship without any of this. There is no ALTER VECTOR TABLE that re-embeds rows in place. There is no migration framework that tracks which rows have been processed. There is no rollback semantic for "swap the embedding model" beyond "rebuild the whole index from scratch." Pinecone, Weaviate, Qdrant, Milvus, and pgvector all expose primitives for storing vectors with metadata, and all of them leave the migration discipline as a homework assignment for the application team.
Most teams don't notice the gap until they hit it. They treat the vector store like a write-mostly KV store. They embed the corpus once, write the vectors, and query them. Then a better model comes along, and the team discovers that the only operation the vector store actually supports is "delete and rewrite." There is no online schema change. There is no rolling cutover. There is no backfill machinery. The team builds it from scratch, badly, under time pressure, and ships.
The discipline that has to land is not exotic. It is the same discipline a senior backend engineer would apply to changing a column type in a 50M-row table. The reason it doesn't get applied is that nobody has labeled the work as a schema migration, because the schema column the migration would need is missing.
The Per-Vector Version Tag Is the Column You Forgot
The starting point is dull and concrete: every vector you write needs a version tag in its metadata. Not on the index. Not on the namespace. On the row. The tag identifies the embedding model that produced it — model name, model version, and any preprocessing parameters that changed (chunk size, normalization, prompt template). A reasonable shape is a single short string like [email protected]/chunk-512/norm-l2. Pinecone, Weaviate, Qdrant, and pgvector all support arbitrary metadata on records; the cost of adding the field is approximately zero, and the cost of not having it later is approximately one engineer-month.
Once the tag is on every vector, several things become possible that were not possible before:
- Read-time filtering. Queries can scope to a specific embedding version. During cutover, production traffic queries the new tag while a small percentage of validation traffic queries both and compares.
- Mixed-state safety. If the backfill is half done, the retrieval layer knows which vectors are in the new "language" and which are still in the old one. It does not silently mix them and produce incoherent neighborhoods.
- Per-record backfill tracking. The migration job can
SELECT WHERE version = 'v1'to find what's left. There is no separate ledger to keep in sync. The vector store IS the ledger. - Rollback without re-embedding. If the new model regresses, flipping reads back to the old tag is a config change. The old vectors still exist. Nothing has been deleted. The rollback is bounded by feature-flag latency, not by re-embedding cost.
Teams that retrofit this column under fire usually discover that the existing vectors don't have it and there is no way to backfill the tag retroactively without knowing which model produced them. The fix is to assume the worst case and tag everything as the oldest possible version, which is fine going forward but means the team has effectively lost the audit trail. Adding the column on day one costs nothing. Adding it on day 800 costs the audit trail that explains why retrieval looked weird in 2025.
Backfill Completeness Is the Cutover Criterion, Not Calendar Pressure
The second mistake teams make is cutting over before the backfill is done. The pattern is recognizable: the migration job is launched, the job is slow because re-embedding a corpus at scale costs real money and real time, the leadership all-hands is in three days, and someone makes the call to cut over the read path "to start showing the win" while the backfill keeps running. Now production reads from a partially-populated new index. Half the corpus is invisible to retrieval. Half the queries return results from a smaller candidate set than they should. The eval suite, which was scored against a fully-backfilled corpus in staging, doesn't catch it because the eval set is small enough to have been backfilled completely.
Production retrieval quality is a function of the candidate pool size. Cut over with 60% of the corpus backfilled and you are silently telling the system "30-40% of the documents don't exist this week." Some queries will land on documents that did get backfilled and look fine. Others will land in coverage holes and return generic, low-relevance results. The pattern across the user base looks like "search got worse for some queries" rather than "search broke," which is the failure mode that takes the longest to diagnose.
The discipline that has to land is a backfill-completeness gate as the formal cutover criterion. Define it numerically: "cut over when the backfill ratio for the new version tag is greater than 99.5% of the old version's record count, with the remaining 0.5% accounted for in a freshness budget." Make the gate visible on the same dashboard that shows the launch date. Make the gate the thing the launch is conditional on, not a thing the launch overrides. The migration is done when the data says it's done, not when the calendar says it's done.
There is a corollary: the backfill ratio belongs on the same dashboard as the eval scores. The eval suite is testing semantic quality on the assumption that the corpus is complete. If the corpus is 60% complete, the eval is testing a different system than the one production traffic hits. Surfacing the ratio next to the eval scores prevents the team from looking at green eval numbers and concluding the migration is safe when the unfinished backfill is the real risk.
Rollback That Doesn't Require Re-Embedding
The third mistake is treating cutover as a one-way door. The pattern: the team finishes the backfill, swaps the alias, validates the metrics for a week, and then deletes the old vectors to free up storage. Six weeks later a customer-facing regression surfaces — the new model handles certain query patterns worse than the old one in ways that didn't show up in the eval set. Now the rollback plan is "re-embed the entire corpus from scratch with the old model," which costs the same as the original migration and takes the same multi-week timeline. The team eats the regression instead of rolling back, because the rollback is more expensive than the bug.
Storage is the cheapest thing in the system. Re-embedding is the most expensive. The math is asymmetric: a duplicated index of vectors at, say, 1536 dimensions in float32 occupies a few GB per million records — table-stakes for any vector store. Re-embedding the same million records through a paid embedding API costs an amount that scales with corpus size and is bounded only by your budget. The right policy is to keep the old vectors in place until the new model has accumulated enough evidence in production to be trusted — measured in weeks of stable retrieval quality, not days. Storage cost is the price of an option to roll back, and it is a cheap option.
The per-vector version tag makes this trivial. The retrieval layer reads from version = 'new'. The old vectors still exist with version = 'old'. Rolling back is a config change at the application layer that flips the version filter. There is no data migration. There is no re-embedding. The rollback completes in seconds. After the new version has a few weeks of production evidence — usage at full traffic, eval stability across multiple cycles, no customer-side regressions — the old vectors can be deleted. Until then, the duplication is not waste. It is the rollback plan.
A useful question to ask in the migration design review: "what is our rollback timeline today, and what would it take to bring it down to under five minutes?" If the answer involves re-embedding the corpus, the migration is not complete. If the answer is "flip the version filter," the migration is properly designed.
The Migration Discipline Embedding Pipelines Need
The pieces add up to a pattern that anyone who has shipped a database migration would recognize. Add the version column. Tag every record. Backfill in the background. Track completeness as a metric, not a vibe. Gate the cutover on the metric. Keep the old data until the new data has earned its trust. Roll forward and roll back without rewriting the corpus.
What's specific to embeddings is that the eval suite alone won't catch the failure modes. Pointwise relevance scores can stay green while the geometry of the index rotates underneath them. The version tag is what lets the team reason about which index they are evaluating, mix versions only when they intend to, and produce per-version eval scores that actually correspond to a coherent embedding space. Without the tag, every eval is suspect, because the team can't be sure which model produced the vectors they are scoring against.
The takeaway for teams in the planning phase of an embedding upgrade is simple: before you run the migration, add the column. If your records already have it, audit that every write path is populating it. If they don't, fix that first, and treat the lack of it as a P0 schema gap. The migration is going to happen sooner or later — model providers ship new versions on their own clock. Teams that have the version tag in place will run the migration as a routine operation. Teams that don't will run it as an incident.
The vector store doesn't have ALTER TABLE. The version tag is what you put on the row when ALTER TABLE doesn't exist.
- https://medium.com/@kandaanusha/vector-database-reindexing-pipeline-87efa1d1cd19
- https://medium.com/google-cloud/migrating-vector-embeddings-in-production-without-downtime-8a0464af6f55
- https://mixpeek.com/guides/embedding-portability-versioning
- https://www.shshell.com/blog/multimodal-rag-module-22-lesson-3-index-versioning
- https://docs.cloud.google.com/spanner/docs/backfill-embeddings
- https://zilliz.com/ai-faq/how-do-i-handle-versioning-of-embedding-models-in-production
- https://particula.tech/blog/update-rag-knowledge-without-rebuilding
- https://docs.pinecone.io/troubleshooting/create-and-manage-vectors-with-metadata
- https://docs.vectorize.io/build-deploy/data-pipelines/understanding-metadata/
- https://www.arxiv.org/pdf/2509.23471
