Skip to main content

AI-Assisted Codebase Migration at Scale: Automating the Upgrades Nobody Wants to Touch

· 11 min read
Tian Pan
Software Engineer

When Airbnb needed to migrate 3,500 React test files from Enzyme to React Testing Library, they estimated the project at 1.5 years of manual effort. They shipped it in 6 weeks using an LLM-powered pipeline. When Google studied 39 distinct code migrations executed over 12 months by a team of 3 developers—595 code changes, 93,574 edits—they found that 74% of the edits were AI-generated, 87% of those were committed without human modification, and the overall migration timeline was cut by 50%.

These numbers are real. But so is this: during those same migrations, engineers spent approximately 50% of their time validating AI output—fixing context window failures, cleaning up hallucinated imports, and untangling business logic errors the tests didn't catch. The efficiency gains are genuine and the pain points are genuine. The question isn't whether AI belongs in code migrations; it's knowing exactly where it helps and where it creates more cleanup than it saves.

The Two Fundamentally Different Tools You Need

The first mistake most teams make is treating "AI-assisted migration" as a single category. There are two distinct tool families, and they're best at different things.

AST-based codemods (ast-grep, jscodeshift, GritQL, OpenRewrite, Comby) work at the syntax tree level. They are deterministic—the same input always produces the same output—and they scale perfectly. ast-grep can search and transform millions of lines across a polyglot codebase in seconds. OpenRewrite has 5,000+ recipes for Java, Python, YAML, Terraform, and Kubernetes migrations. These tools don't hallucinate. They also don't understand what your code is doing—they transform structure, not semantics.

LLM-based agents (Claude Code, Copilot, custom pipelines built on frontier models) understand semantics. They can migrate a callback-style API to a promise-based one while preserving the intent of surrounding business logic. They can read inline comments and coding style and produce code that fits the codebase's conventions. They also sometimes invent imports that don't exist, produce nondeterministic output for the same prompt, and fail silently when the codebase exceeds their context window.

The practical conclusion: use AST tools for detection and structural transformation, use LLMs for semantic transformation where the structure change requires understanding meaning. The hybrid approach—deterministic engine for matching patterns, LLM for rewriting the matched section—combines the reliability of the former with the contextual intelligence of the latter.

Where AI Migration Is Genuinely 10x Faster

Not all migrations are equal. Some task classes have a profile that makes AI assistance straightforwardly correct:

Mechanical API replacements with well-defined transformation rules. React's deprecated componentWillMountcomponentDidMount transition. React Testing Library's analogues for Enzyme's .find() and .simulate(). Next.js codemods for the Pages Router → App Router migration. These have documented transformation patterns, the new API surface is well-understood, and tests immediately tell you whether the transformation was correct. AI succeeds on these because the mapping is clear and the test signal is tight.

Test framework migrations. Airbnb's 3,500-file Enzyme migration is the canonical example. Test files rarely contain business logic that the LLM needs to reason about carefully—they contain test setup, assertions, and mocks that follow predictable patterns. The output quality is high, the failure mode is isolated (a broken test doesn't break production), and the test suite itself is the validation mechanism.

Language version evolution on clean business logic. Python 2 → 3, Java 8 → 17, TypeScript strict mode adoption. When the code is algorithmic—mostly data transformations, utility functions, domain logic without complex infrastructure—correctness on 80–90% of files with minimal manual intervention is realistic. The remaining 10–20% are edge cases where a human needs to review, not structural failures that propagate.

Bulk mechanical refactors. Renaming a deprecated symbol across 4,000 files. Moving imports. Normalizing whitespace and formatting in a legacy codebase before enabling ESLint. These are deterministic enough that even pure codemods handle them well—the LLM's role is smaller and its hallucination risk correspondingly lower.

Where AI Migration Creates More Work Than It Saves

The failure modes cluster around four categories:

Complex architectural refactorings. Moving from a monolithic service to a modular architecture. Splitting a God class into proper domain objects. Reorganizing multi-module dependencies. LLMs underperform sharply on cross-class reasoning. They lack the global codebase context needed to understand which modules depend on each other implicitly, and they tend to produce code that's locally coherent but globally inconsistent. The cleanup work here can easily exceed what a disciplined manual approach would have cost.

Migrations with domain-specific constraints. Financial calculations where rounding behavior has regulatory implications. Insurance underwriting logic where business rules are embedded in the code rather than documented. Medical device software where behavior semantics are specification-driven. AI can translate the surface syntax correctly while subtly changing semantics in ways that unit tests won't catch—and that might not surface until a customer complaint or an audit.

Large-scale architectural rewrites without redesign. Translating a 200,000-line Delphi application to C# line-by-line doesn't produce a C# application—it produces a Delphi application that compiles on the wrong runtime. AI agents can translate code syntax and patterns accurately, but they don't fix structural problems. If you're migrating without rethinking the architecture, you're automating the perpetuation of the original design mistakes, and the cleanup cost comes later.

Codebases with missing type information. Untyped JavaScript, Python without annotations, legacy C without documentation. LLMs hallucinate types and relationships when the code doesn't make them explicit. One hallucinated type cascades: a fabricated interface definition causes the consumer to pass the wrong shape, which causes a serialization error in production, which causes an outage in a system that seemed completely unrelated to the migration.

The Verification Strategy That Makes This Safe

Regardless of which tools you use, one principle is non-negotiable: every migrated file must pass its existing test suite before a human ever reviews the diff.

Google's production pipeline enforces this. Changes are validated through CI/CD before developer review—developers only see changes that have already passed builds and tests. This turns the review workload from "is this correct?" to "is there anything the tests didn't catch?" That's a much smaller cognitive task, and it's the right human-in-the-loop use.

The test gate has a specific implication for how you sequence migrations: you cannot safely run large-scale AI migration on a codebase with low test coverage. Before attempting a 3,000-file migration, you need to know which files have meaningful test coverage and which don't. Files with coverage go through the automated pipeline. Files without coverage require manual migration or test-first manual migration—you write the tests, then migrate, then verify. Mixing them without distinguishing creates a false signal: the pipeline reports 3,000 files migrated, the tests pass, and you've silently regressed behavior in the 400 files that had no tests.

A validation layer beyond unit tests matters for production systems. Contract testing validates that refactored code adheres to interface expectations when other services depend on the migrated code's API behavior. Cross-family LLM judges—using a different model family to review the AI-generated diff—catch hallucinations that the original model won't flag as wrong.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates