The Two Clocks Problem: When Your Model Provider's Cadence Breaks Your Roadmap
There are two clocks ticking on your AI product, and they are not synchronized. The model providers run on a roughly quarterly heartbeat — Claude Opus 4.6 in February 2026, GPT-5.4 in March, Claude Opus 4.7 in April, GPT-5.5 a week later. Your product roadmap was committed in January and does not look up again until July. Somewhere in between, a capability you spent eight engineer-weeks building gets shipped as a one-line API parameter, and nobody on the team has a process for noticing.
This is not a forecasting problem. The releases were widely telegraphed — anyone who reads the changelog could have seen each of them coming. It is a planning-artifact problem. Roadmaps were invented for a world where the platform underneath your product changed once a decade. The platform now changes once a quarter, and the artifact has not been updated to match.
The failure mode is symmetrical and embarrassing in both directions. A team commits to a nine-month build of a long-context retrieval pipeline; in month four, the next Sonnet release ships with a 2M-token window and the bespoke pipeline is strictly inferior on quality, latency, and cost — but the roadmap has no cancellation pathway, so the team finishes anyway. A different team waits for "the next model" before starting a feature that today's models could already deliver, and the competitor with a capability-floor mindset ships and captures the market in the meantime. Both teams are doing 2019 product management on 2026 infrastructure.
Capability assumptions are dependencies, name them like dependencies
Build pipelines pin library versions because nobody trusts a transitive dependency to stay still. Roadmaps that depend on AI capabilities should do the same — name the capability you are assuming, and pin which model and which version you are assuming it from. "We are betting that 200K-context routing works at p95 latency under 4 seconds with the current Sonnet tier" is a roadmap line item. "We are adding AI to the contract review flow" is not.
The discipline this unlocks is reading a model release the way you read a CVE feed. When a new model lands, you walk the roadmap and ask, for each capability assumption, whether the new release strengthens it, weakens it, or makes the bespoke implementation obsolete. Most quarters, the answer for most line items is "no change." But once or twice a year a release lands that obsoletes a workstream or makes a previously-impossible feature trivial, and a team that has named its dependencies notices that day instead of three months later.
This is the same mental model as a security backlog. Nobody pretends a one-time audit catches every CVE; you accept that vulnerabilities will land mid-quarter and you have a triage process for them. AI capability releases need the same operational treatment, not a quarterly check-the-box review.
The quarterly capability review, treated as a real planning ritual
The cheapest version of this is one meeting per quarter, two hours, attended by the engineering lead, the AI lead, and the PM who owns the AI roadmap. The agenda is fixed: walk every model release in the quarter, walk every roadmap item, and for each pair note whether the item should be re-scoped, accelerated, killed, or left alone. Anything that gets killed needs a written rationale so the lesson sticks; anything that gets accelerated needs a re-staffing decision the same week.
The version that fails is the one where the meeting exists but produces no decisions. "Yes we should look at that" is not a decision. The output of the review is a list of roadmap edits, with owners, taking effect Monday. If the review produces nothing for two consecutive quarters, you are doing it wrong — either your roadmap has no real AI dependencies (in which case stop calling it an AI roadmap) or your team is not doing the work of evaluating new releases against in-flight commitments.
The reason quarterly is the right cadence and not monthly is that most months have no significant release. The reason it cannot be annual is that any twelve-month window in 2025 or 2026 contained at least three releases that materially changed what was buildable. Cadence-matching is the whole point of the ritual.
A portfolio of capability-floor and capability-ceiling bets
Not every roadmap item should react to model upgrades the same way. Some bets are built against today's capabilities and should ship as if no further capability is coming — a customer is paying you now and "wait for the next release" is not a feature. Other bets explicitly assume next-quarter capability and would not be feasible without it — a long-running research-style agent that needs 4-hour autonomous task completion is a 2026 bet, not a 2024 bet, and shipping it against today's capability is just a worse version of the inevitable product.
The mistake teams make is being all-in on one side without realizing it. A team that ships only capability-floor features looks like it's executing well but is leaving the entire upside of the next year of model releases on the table. A team that ships only capability-ceiling features ships a slick demo every six months and has nothing in production. Both are easy to drift into and almost impossible to detect from inside.
The fix is to treat the split as a budget, not a vibe. "60% of AI engineering time goes to capability-floor work, 30% to capability-ceiling, 10% to deprecation and migration" is a sentence a leader should be able to say out loud. Once you have the number, you can argue about whether it is the right number for your stage and market — and you can notice when it has drifted six months later because nobody was watching.
A deprecation calendar that is not somebody's spreadsheet
Model providers retire models on schedules they publish, and those schedules are honored on the date written. The original Claude 3 Opus retired on January 5, 2026. Claude 3.5 Sonnet v2 was deprecated in August 2025 and shut down on February 19, 2026. Claude 3.7 Sonnet had a six-month deprecation window ending May 11, 2026. OpenAI's Assistants API was deprecated in August 2025 and the lights go out on August 26, 2026. None of these dates were a surprise; all of them paged at least one team that did not have a calendar entry.
A real deprecation calendar lives in the same place your other infrastructure deadlines live — your incident system, your team calendar, your roadmap planning tool. It has a 90-day pre-deadline alert, an owner, and a migration estimate. If the migration is not trivial, it gets a roadmap line of its own at least one quarter before the cutoff. If the migration is trivial, it still gets a line, because trivial migrations are how you discover that your eval suite has been silently passing on a different distribution for three weeks.
The teams that get burned by deprecations are not the ones that read the announcements. They are the ones that read the announcements, agreed it would be fine, and then never put the date on the calendar. The forgetting curve is the bug.
Kill criteria, attached to AI roadmap items at intake
The single most expensive bug in AI product planning is the absence of a defined cancellation pathway. A team commits to building a custom OCR pipeline; six months in, multimodal models ship with native document understanding that beats the pipeline on every benchmark. The team ships the pipeline anyway because the roadmap has no language for "this is no longer a good use of engineering time." The sunk cost is sunk; the actual loss is the next six months of opportunity cost.
The fix is procedural, not cultural. Every AI roadmap item, at intake, gets a kill criterion written next to it. "We will kill this if a frontier model ships native support that scores within 10% of our quality bar at 50% of the cost." "We will kill this if our eval suite shows no quality gap against the base model." "We will kill this if the differentiation gap measured at 90 days is narrower than at 30 days." The criterion does not need to be perfect; it needs to be writable, falsifiable, and visible.
The criterion forces the conversation that nobody wants to have at the start, when it is cheap, instead of at month nine, when it is expensive. It also gives a team permission to walk away from work without losing face, which turns out to be the actual blocker most of the time. Engineers know when their pipeline is no longer better than the base model. They need a written-down reason to say so.
Model-version pinning, but with eyes open
The temptation is to pin every model version forever and treat any upgrade as a separate planning event. This is a defensible position in regulated industries; in most other contexts, it is a way to fall a year behind your competitors who are upgrading every quarter. The right default is a pinned production version with an active "next" version under evaluation, and a published cutover window each quarter where the eval-passing next becomes the production version.
The non-obvious cost is the eval suite. A pin-and-upgrade cadence is only as good as the regression detection between versions. Teams that upgrade quarterly without an eval suite are running a one-team A/B test on their users; teams with a strong eval suite know within a week of a release whether the new model is a quality win, a regression, or neutral. The eval suite is not a side-project — it is the thing that makes the cadence safe.
This also explains why "GPT wrapper" is sometimes a slur and sometimes a sustainable business. The wrapper that does no eval, no fine-tuning, no domain-specific scaffolding — the one that anyone could rebuild in a weekend — is correctly read as a placeholder. The wrapper that has years of eval data, a captive distribution channel, and a workflow that captures customer-specific context is upgrade-resilient because the surface area that matters is not the model but everything around it. The two clocks problem is mostly a problem for products that have not yet built that surface area.
The leadership realization
The roadmap as a planning artifact assumes a stable platform. AI is not a stable platform, and pretending otherwise is the most expensive form of optimism on the team's books. The teams that ship well in this environment are not the ones that predict model releases — nobody can — but the ones that have wired the planning process so that a release can be absorbed in a week instead of being argued about for a quarter.
The work is not glamorous. A capability-assumption column on the roadmap. A two-hour quarterly review with real decisions. A deprecation calendar that lives next to the on-call calendar. A kill criterion attached at intake. A capability-floor/ceiling budget with an actual number. None of it is research-grade engineering. All of it is the difference between a team whose roadmap survives contact with the next release and one whose roadmap is a museum piece by mid-quarter.
If your team has been shipping AI for more than a year and has none of these in place, the next model release is going to make that visible whether you want it to or not. Better to add the second clock to your planning process than to let the model providers add it for you.
- https://developers.openai.com/api/docs/deprecations
- https://github.com/quora/model-deprecation-tracker
- https://deprecations.info/
- https://subquery.ai/blog/2026-01-21-ai-model-deprecation-calendar
- https://www.mypminterview.com/p/how-to-build-an-ai-product-roadmap
- https://www.mypminterview.com/p/product-strategy-build-vs-buy-ai-capability
- https://davefriedman.substack.com/p/why-openai-will-eat-all-the-gpt-wrappers
- https://medium.com/@rob.w.automation/roadmaps-to-nowhere-why-your-ai-plan-isnt-a-strategy-0b451c615d86
- https://llm-stats.com/llm-updates
- https://github.blog/changelog/2026-02-19-selected-anthropic-and-openai-models-are-now-deprecated/
