The Model Deprecation Treadmill: Discipline That Has to Exist Before the Sunset Email
The team that treats "we use the latest model" as a virtue is one sunset email away from a quarter of unplanned work. By the time the deprecation notice lands, the architectural decisions that determine whether you can absorb it have already been made — months ago, by people who weren't thinking about migrations at all. The eval suite was implicitly trained against a specific checkpoint. The prompts were tuned against a specific refusal style. The cost projections assumed a specific token-per-task baseline. The router has a hardcoded fallback to a model that is itself about to disappear. None of these decisions look like risks until the email arrives, and then all of them look like the same risk.
Model deprecation is now the most predictable surprise in the AI stack. Anthropic gives a minimum of 60 days' notice on publicly released models. OpenAI's notice windows range from three months for specialized snapshots to 18 months for foundational models, but in practice a recent batch of ChatGPT model retirements landed with as little as two weeks' warning for some teams. GitHub deprecated a slate of Anthropic and OpenAI models in February 2026 in a single coordinated changelog entry. The pattern is no longer "if a model retires" — it's "every quarter, at least one model your stack depends on enters a retirement window, and the calendar isn't synchronized to your roadmap."
This post is not a migration playbook for the day the email lands. It's about the discipline that has to exist before the email lands, so that the migration is two engineer-weeks of mechanical work rather than a quarter of evals theater, prompt archaeology, and customer apology.
A Model ID Is a Dependency With a Published End-of-Life Date
The single mental shift that separates teams who absorb deprecations from teams who get crushed by them: stop treating model identifiers like brand names and start treating them like library versions with a published EOL.
When you depend on openssl 1.1.1, you don't write code that says "use whatever OpenSSL is installed." You pin a specific version, you track when that version goes EOL, and you plan the upgrade to land before the security patches stop. The discipline is so universal in systems engineering that it's invisible — nobody writes a blog post about why you should pin OpenSSL.
In LLM stacks, the equivalent discipline is rare. The default code people write is model="claude-sonnet-4-6" or model="gpt-5" — short aliases that resolve to whatever the provider's current default happens to be on the day of the request. Anthropic's documentation explicitly distinguishes between snapshot identifiers like claude-opus-4-5-20251101 (reproducible, version-pinned, with a published retirement date) and rolling aliases that always point to the latest release in a family. The pinning matters because the retirement dates are real: Anthropic's deprecation table lists claude-opus-4-20250514 as deprecated on April 14, 2026 with retirement on June 15, 2026. Sixty-two days. If your call site is using the alias claude-opus-4, you have no idea which side of that line you're on until something starts failing.
The cost of this lazy aliasing is that you cannot answer the most basic question a deprecation email forces on you: "where in our stack are we calling this model?" Without a registry of pinned versions per call site, the answer is "we'll grep for it, and probably miss the dynamically constructed ones."
The Versioned Model Registry
The first piece of pre-sunset infrastructure is a versioned model registry. The shape is simple: a single source of truth that maps every production call site to the specific model snapshot it has been tested against, the date that test was last run, and the team that owns the prompt running on it.
The registry is the artifact, not the spreadsheet. It needs to be machine-readable, checked into a repository, and enforced — meaning a request that calls a model not declared in the registry should fail in CI or at deploy time, not silently route to whatever default the SDK picks.
A workable schema includes:
- Call site identifier: a stable string like
support-classifieroringest-extractorthat names the surface, not the file path. - Pinned model id: the full snapshot identifier with date suffix, never an alias.
claude-opus-4-5-20251101, notclaude-opus. - Provider and routing tier: direct API, Bedrock, Vertex, or self-hosted, because deprecation calendars differ across these.
- Last eval baseline: the date the eval suite was last run against this exact model version, with a link to the run.
- Owner: a named team — not a platform mailing list, not "AI infra," a team with a Slack channel and an on-call rotation.
- Retirement date: copied from the provider's deprecation table, with the source URL.
The registry pays for itself the first time a sunset email arrives. Instead of a week of grep-and-pray, you query the registry, get a list of impacted call sites, page their owners, and start the migration with a known scope.
The Deprecation Calendar Lives Next to Cert Expiry
A registry without a calendar is a list. The second piece of infrastructure is a deprecation calendar — a single timeline that tracks every model retirement date alongside the other operational deadlines the team already manages: TLS certificate expiry, cloud credit renewals, vendor contract anniversaries, compliance audit windows.
The reason to put model retirements on the same calendar as cert expiry is not symbolic — it's that the org already has muscle for cert renewal. Someone owns it. There's a runbook. There's a paging escalation if the deadline slips. Model deprecations need that same machinery, and the cheapest way to build it is to attach the work to a calendar that already has it.
Each entry on the calendar should carry:
- Retirement date (provider-published).
- Migration target date, set conservatively at retirement-minus-30-days to leave room for surprise.
- Named owner with a paging escalation, not a team alias.
- Rollback model, also version-pinned, also on the same calendar with its own retirement date.
- Eval baseline owner, who runs the n+1 evals.
The calendar lives somewhere visible. Not in a PM's Notion doc. The same place that lists "AWS reserved instance renewal — Sept 2026" and "SSL cert rotation — Dec 2026."
The N+1 Eval Suite Runs Continuously
The deepest source of migration pain is the team that runs an eval suite against the current model and discovers, on day 28 of a 60-day sunset window, that the replacement model fails 12% of the eval cases. The mistake isn't the failure rate — it's that the failure rate was unknown until the deadline.
The discipline that fixes this: every production call site has its eval suite running not just against its current pinned model, but against the n+1 candidate continuously. When claude-opus-4-5 is in production, the suite is also running nightly against claude-opus-4-6 and claude-opus-4-7. The migration delta is a known quantity weeks before the cutoff, not a surprise on day 28.
This is cheaper than it sounds. Eval suites for production LLM features are typically 50 to 500 cases, run nightly takes minutes of compute, and the cost is dwarfed by the cost of a panicked migration. The output is a dashboard: for each call site, here is the current model's pass rate, here is the n+1 model's pass rate, here is the delta, here are the case IDs that regress.
The hidden benefit of the n+1 eval is that it surfaces prompt portability problems while you have time to fix them. Prompts that work great on gpt-5.3 and badly on gpt-5.4 reveal that the prompt was tuned against quirks of the older model. That's a maintenance bug, and you'd rather find it on a Tuesday in March than on the Friday before retirement.
The teams who skip this step end up running it anyway, just compressed into the last week of the migration window with a panicked engineer and a bad Slack thread. The work is the same; the schedule is the only variable.
The Fallback Router Is Itself a Trap
Most production LLM systems have a fallback router: if the primary model returns an error or times out, retry against a secondary. The pattern is good. The implementation is almost always wrong.
The wrong implementation: secondary is hardcoded to a model alias, not a snapshot. primary: claude-opus-4-7, fallback: claude-sonnet-4. The fallback isn't pinned, isn't on the deprecation calendar, isn't in the eval suite, and is one provider sunset away from being a model that doesn't exist. When the fallback path actually fires — which it does, because primary outages happen — the system silently routes to whichever model the alias resolves to that day, with no eval baseline.
The right implementation: the fallback model is itself a registry entry. Pinned snapshot. Owned. Eval-tested. Listed on the deprecation calendar with its own retirement date. Treated as a primary in every dimension except request volume.
This sounds obvious in writing and is almost universally violated in practice. The asymmetry is that the primary path gets attention because it's hot. The fallback path gets attention only when it fires, and by then you're already in an incident. Building the fallback registry entry up front, with the same rigor as the primary, is the only way to avoid the "our fallback model retired three weeks ago and we just noticed during an outage" failure mode.
The Cost Frame Nobody Surfaces in Planning
Here is the cost number that should sit on the platform team's quarterly planning slide: a forced model migration is approximately two engineer-weeks per surface, plus an eval re-baseline, plus a customer-comms drafting cycle.
The two-week-per-surface estimate is the median across reported migrations: prompt re-tuning when the old prompt stops working, output parser fixes when JSON formatting drifts, latency re-tuning when the new model takes 20 to 40% longer at the same effort level, and the testing labor to verify all of it. Some migrations are half that — when the prompt is robust and the eval suite is in good shape. Some are five times that — when the new model is a reasoning model and the old one wasn't, requiring a near-complete prompt rewrite and a new latency budget.
Multiply by the number of production surfaces. A team running five LLM-powered features hits this five times per provider sunset. Two providers with quarterly sunsets means roughly 40 engineer-weeks per year of forced migration work, before any new feature development. That is one full-time engineer absorbing model deprecations, indefinitely, as a tax on the choice to build on hosted models.
The number is not a reason to stop building on hosted models. It is a reason to surface this cost in planning so it is funded, owned, and predictable — rather than discovered as a quarterly surprise that crowds out the roadmap.
The Contract Pattern That Gives Back Leverage
The default enterprise AI contract gives the provider unilateral control over the deprecation timeline. The 60-day notice is a floor, not a negotiation outcome. For teams whose product is materially dependent on a specific model's behavior — fine-tuned wrappers, regulated workflows, customer-visible features with performance SLAs — this default is too weak.
The contract pattern that buys back leverage is a negotiated minimum support window for named model versions, written into the enterprise agreement before signing. The shape:
- Named version commitment: the contract names specific model snapshots (not aliases) the provider commits to support.
- Minimum support window: typically 12 to 18 months from launch date for foundational models, negotiated as a floor.
- Migration assistance clause: if the provider sunsets a named version inside the window, they cover defined migration costs or extend access.
- Notification SLA: a longer notice period than the public default, often 120 days or more for the named versions.
This pattern only works at enterprise tier and only if it's negotiated before signing. The leverage to ask for it evaporates the day the deprecation email arrives. Most teams don't ask because they don't know it's negotiable. Microsoft's standard Azure OpenAI SLA, for instance, doesn't address model retirement rework costs at all — but version pinning rights are explicitly negotiable in the enterprise agreement, and enterprise advisors recommend asking for them as a standard term.
The work of asking is small. The work of not asking is two engineer-weeks per surface, every time.
The Org Failure Mode: Nobody Owns the Migration
The structural pattern that turns model deprecations into quarterly fires: the platform team owns the model registry, the product team owns the prompts, and neither team owns the migration.
The platform team's incentive is to ship the registry, configure the calendar, and consider the work done. The product team's incentive is to ship features against whatever model the registry currently lists. When a deprecation email arrives, the platform team forwards it to the product team. The product team treats it as work that interrupts the roadmap and pushes back. The platform team has no leverage to enforce the migration, because they don't own the prompts and can't run the evals. The retirement date passes. Something breaks.
The fix isn't more process. It's clear ownership of the migration as a single piece of work, with a named driver who has the authority to prioritize it on the product team's roadmap. Some orgs solve this by making migrations a platform-team responsibility — the platform team does the prompt re-tuning and eval re-baselining work themselves, with product team review. The trade-off is platform engineers context-switching across a portfolio of features. Other orgs solve this with a forcing function: the registry gates deploys, and an unmaintained call site eventually fails CI. The trade-off is rude friction during incidents.
Both patterns work. The pattern that does not work is "the platform team owns infrastructure and the product team owns features and migrations are everyone's job." Migrations that are everyone's job are nobody's.
Treat Models Like the Operating Systems They Are
The discipline this post describes is not new. Mature engineering orgs apply it to operating systems, language runtimes, library dependencies, and API consumers: pin the version, calendar the EOL, test the rollback, name the owner, address the rework cost in the contract.
What's new is the cadence. Linux LTS gets a decade. Python minor versions get five years. A foundation model gets 12 to 18 months between launch and retirement, with as little as 60 days of formal notice. The compression doesn't break the disciplines — it means the cost of skipping them is paid every quarter instead of every five years.
The team that treats this as routine engineering — registry, calendar, n+1 evals, version-pinned fallbacks, contract terms, named ownership — absorbs each sunset in a few engineer-weeks of mechanical work. The team that treats each email as a surprise rediscovers the lessons of OS, library, and API deprecation in a stack that moves three times faster. The email is going to arrive. The only variable is whether the calendar already had it on it.
- https://platform.claude.com/docs/en/about-claude/model-deprecations
- https://developers.openai.com/api/docs/deprecations
- https://www.anthropic.com/research/deprecation-commitments
- https://safjan.com/the-real-cost-of-model-migration-what-swapping-llms-actually-requires/
- https://github.blog/changelog/2026-02-19-selected-anthropic-and-openai-models-are-now-deprecated/
- https://www.theregister.com/2026/01/30/openai_gpt_deprecations/
- https://learn.microsoft.com/en-us/azure/foundry/openai/concepts/model-retirements
- https://redresscompliance.com/azure-openai-sla-and-support-whats-covered-and-whats-not.html
- https://deprecations.info/
- https://www.rubick.com/should-platform-teams-deprecate/
