Skip to main content

The AI Off-Switch That Doesn't Exist: Retiring Features After Users Co-Author the Archive

· 11 min read
Tian Pan
Software Engineer

Six months after you launched the AI writing assistant, you open the analytics dashboard and find the metric you wanted: 40% of user-generated documents on the platform now contain AI-authored prose. The board meeting calls this engagement lift. Three weeks later, the model provider raises prices, the unit economics flip, and someone asks the obvious question: can we turn it off? You go looking for the toggle and discover that it isn't a toggle. It's a migration with product, legal, and UX surfaces attached, and pulling it cleanly will take two quarters and burn political capital with three teams who didn't know they were stakeholders.

This is the part of the AI product lifecycle that nobody planned for. The launch playbook covered prompt engineering, rate limits, eval harnesses, and a kill switch for runaway costs. It did not cover what happens when users have spent half a year producing artifacts that only exist because the generator existed, and now the read path through your archive depends on a feature you want to retire. The "off switch" was conceptual: a flag in a config file. The actual decommissioning is a coordinated set of decisions about grandfathering, versioning, content provenance, and the uncomfortable conversation about whether the engagement lift was ever value or just dependency.

Most teams reach this point and freeze. They keep running the feature at a loss because the cost of pulling it cleanly looks higher than the cost of keeping it. That stasis compounds: every additional month of co-authored content increases the migration surface, and the eventual sunset gets harder, not easier. The way out is not improvising at the end. It's treating decommissioning as a discipline that starts at launch.

The "Toggle" Is a Read-Path Problem, Not a Write-Path Problem

The first surprise in retiring an AI feature is that turning off the generator is the easy half. The hard half is the archive. Every document, summary, draft, transcript, image, or response that the feature produced is still in the system, still readable, still being viewed by the user who created it and the colleagues they shared it with. If your sunset plan is "ship a flag that disables the writer," you have addressed the future and ignored the past.

The read path carries a few specific problems. The first is artifacts that contain prompts, model outputs, or tool-call traces stored as metadata next to the user-visible content. Pulling the feature without a plan can leave dangling references where the UI tries to render a "regenerate" button against an inference endpoint that no longer exists, or load a transcript whose schema your new code doesn't recognize. The second is content that was synthesized but presented as if the user wrote it — which the user genuinely did, in the sense that they curated, edited, and shipped it, but which carries hidden artifacts only the generator could produce. Pulling the feature breaks the read path for those artifacts in ways the user can see.

The pattern that handles this cleanly is to separate the writer from the reader and decommission them on different timelines. The writer — the inference call, the cost center, the part you actually want to stop running — comes down first. The reader — the rendering code, the schema, the view layer that knows how to display historical artifacts — stays operational on a longer schedule, treated as a maintenance surface rather than a feature surface. This buys you the cost savings without breaking the archive, and it lets the read path die naturally as users replace or migrate old content.

Grandfather and Version Your Artifacts

The single most useful decision you can make at sunset is to stamp the existing archive with a version label. Treat every artifact produced by the deprecated feature as belonging to a generation: "generated 2025-Q3," "generated by writer-v2," "generated under the legacy assistant." This is not cosmetic. It is the metadata that lets you reason about the archive going forward — for legal review, for content audits, for the inevitable user question of why the new assistant produces different output than the old one did.

The practice has a precedent in content provenance work. The C2PA Content Credentials standard, which OpenAI, Adobe Firefly, and Google Imagen all embed in their generated images, attaches a cryptographically signed manifest describing what tools produced an asset and when. The standard is designed for inter-organization trust, but the underlying idea — that every AI-generated artifact should carry a versioned provenance record — applies inside your platform too, especially at sunset. If you grandfather artifacts without versioning them, you lose the ability to distinguish "produced by the deprecated feature" from "produced by the current feature" three years later when a question comes up. With versioning, you can answer it in a query.

Versioning also gives you a clean unit of migration. If you decide to offer users a one-click rewrite of their old content under the new system, the version label is what you scope the migration to. If a regulator asks how much of your platform's content was produced under the model behavior in effect during a specific window, the version label is the answer. If a user wants to delete or annotate everything the deprecated feature touched, the version label is the predicate.

Stop Cutting Features by Calendar Date, Start Cutting by Cohort

The temptation when sunsetting a feature is to pick a date — November 1, end of Q4, the next major release — and announce that the feature goes dark on that day. This is the wrong shape of plan for AI features specifically, because the user impact is non-uniform. A heavy user of the writing assistant whose entire workflow depends on it experiences the cutoff completely differently from a user who tried it once and forgot about it. A blanket date treats them identically.

The cohort approach is to migrate users off the feature in waves keyed to behavior, not calendar. Cohort one is the never-users and tried-once users — they get the feature removed silently, no notification needed, because they will not notice. Cohort two is the moderate users who can adapt to the alternative with a brief in-product nudge. Cohort three is the power users whose workflows depend on the feature; they get a long deprecation window, hands-on migration support, and direct outreach. Cohort four, if you have one, is the customers whose contracts or integrations include the feature explicitly, and they get a contractual renegotiation rather than a sunset notice.

Running this in waves does two things that a date-cut switch cannot. First, it lets you learn from each wave and adjust the migration tooling for the next one — the never-users surface no signal, but cohort two surfaces the questions that cohort three will ask louder, and you can prepare for them. Second, it limits blast radius. If the migration tooling has a bug, you find it on cohort one or two, not on the power users whose loss of trust is expensive.

The instrumentation that makes this work is straightforward but not free: you need usage telemetry per user that distinguishes incidental from essential use, you need feature-flag infrastructure that can target cohorts rather than just percentages, and you need a way to measure migration success that goes beyond "did they stop hitting the endpoint." Without that instrumentation, the cohort plan collapses back into a date plan. Building it during sunset is doable but slow. Building it at launch, as a routine part of feature instrumentation, is what separates teams that can retire AI features cleanly from teams that can't.

Plan Exit Criteria at Launch, Not at Sunset

The sunset literature outside AI has a clear consensus: exit criteria belong in the launch plan, not in the post-mortem. The same logic applies to AI features, with one extra wrinkle. AI features are uniquely prone to silent dependency creep — users start relying on outputs they didn't realize were AI-generated, content accumulates that only the AI could have produced, and the "engagement lift" reported on the dashboard is a function of the feature's existence rather than its value. This makes after-the-fact sunset judgment especially treacherous, because the metric you'd use to justify keeping the feature is structurally biased toward keeping it.

Concrete exit criteria look like: an adoption-quality threshold (not just adoption, but adoption with a stickiness or task-completion signal), a unit-economics threshold (cost per outcome below a stated value, with the outcome defined upfront), a competitive threshold (the feature still beats the user's available alternatives by a stated margin), and a strategic-fit threshold (the feature is still in the part of the product the company wants to invest in). Any one of these going red is a trigger for a sunset review. None of them needs to be a hard auto-kill — the trigger initiates a decision, not the decision itself — but the trigger has to be defined in advance, because otherwise the people closest to the feature will rationalize the failing metric.

The hardest of these to set is the unit-economics threshold, because token prices and model performance both move under your feet. Set it as a ratio rather than an absolute number — cost-per-outcome relative to revenue-per-user in the cohort that uses the feature, for instance — so the threshold survives a 50% drop in token prices without becoming meaningless.

The Engagement-Lift-as-Dependency Conversation

The most uncomfortable part of retiring an AI feature is the moment the product team realizes that the lift was dependency, not value. This is a specific failure mode: the feature shows great engagement, great retention contribution, great usage growth, all the right curves on all the right dashboards, and when you propose pulling it the team panics about churn. Then someone runs the experiment of pulling it for a 10% holdout cohort, and the holdout cohort doesn't churn — they just adapt, and a quarter later their satisfaction scores are indistinguishable from the control group. The lift was real. The value was much smaller than the lift implied.

This is the AI version of a generic measurement problem in product work, and Goodhart's law captures the shape of it: when a metric becomes the target, it stops being a good measure. Engagement metrics are particularly bad at distinguishing "users find this valuable" from "users have built routines around this and would build routines around something else just as easily." The way to fight this is not to ignore engagement, but to pair every engagement metric with a holdout-cohort experiment that measures counterfactual value at least once a quarter for any AI feature, and to require that experiment as part of the renewal of investment in the feature, not as a kill criterion.

The output of that experiment is also the script for the conversation with leadership. "We pulled the feature for cohort X and observed Y impact on the outcome we actually care about" is a defensible argument either direction. "Engagement is up 40%, please don't make us turn it off" is not a sunset plan, it's a rationalization, and a discipline that requires the counterfactual data forces the rationalization to come out into the open.

What This Means for the Next AI Feature You Ship

The takeaway is structural. Every AI feature you ship should be designed with a sunset plan that is real, written, and binding: versioning of artifacts at the moment of generation so the archive carries provenance, separation of writer and reader so they can be decommissioned independently, cohort-based migration tooling so the eventual retirement does not have to be a calendar event, exit criteria pinned to outcomes rather than engagement, and a quarterly counterfactual experiment that distinguishes value from dependency.

None of this is novel product work. The novelty is that AI features make the failures of the generic playbook expensive in ways previous feature retirements were not — because the artifacts persist, because the engagement is sticky in ways that mask value, because the cost of running the feature is variable and rising, and because the regulatory surface around generative content is growing rather than shrinking. The teams that will retire AI features cleanly in 2027 are the ones that wrote the sunset plan in 2025, when the feature was launching, and the discipline they bring to launch is what determines whether the off switch exists at all.

The off switch you want at sunset is the one you built into the design at launch. If you didn't, you do not have an off switch — you have a migration, and the longer you wait the more expensive it gets.

References:Let's stay in touch and Follow me for more thoughts and updates