Skip to main content

The Disable Switch Is the Real Product: Designing the Non-AI Fallback Path

· 10 min read
Tian Pan
Software Engineer

Every AI feature ships with a moment its team hasn't planned for: the moment it has to be turned off. A model regression lands during the morning standup. A cost spike from a marketing campaign nobody told engineering about doubles the bill in twelve hours. A privacy review flags a prompt-context leak. The provider goes down for ninety minutes. A compliance team waves a flag at noon and the feature has to disappear before the close of business.

The disable switch most teams ship for that moment is "the feature returns an error" — a spinner that never resolves, a banner that says "AI assistant unavailable, try again later." That is a strictly worse user experience than the pre-AI status quo, which is exactly what users will compare you to the moment AI degrades. The status quo had a button. Now they get an apology.

The disable switch isn't a control panel for engineers. It is a product surface — possibly the most important one — and the team that treats it as a YAML toggle has built a feature that becomes useless on its worst day. That worst day is the day users decide whether the AI is actually trustworthy, because trust is built when things break, not when they work.

The off-state is a separate product, with its own roadmap

The first move that distinguishes mature AI teams is treating the disabled path as a real product, with a PM, a design owner, and an eval suite. Not a try/catch branch buried in a controller. A coherent experience that handles the use case the AI was layered on top of, and names what changed without breaking trust.

Concretely, this means three commitments most teams skip:

  • A deterministic path that handles the 70% case. The AI was usually a wrapper around something simpler — a search box, a template, a rules engine, a human queue. That underlying capability still exists in your codebase, or it should. The off-state is when it gets to do its job.
  • A graceful narrative that names the gap. "We've temporarily disabled smart suggestions while we investigate an issue. You can still create the report manually here." Specific, scoped, and honest. Not "Something went wrong." The user should know whether to wait or work around.
  • An owner. When the AI version ships, the deterministic path almost always loses its product owner. It rots. Six months later, when the disable switch flips, the fallback ships an experience built for 2024 traffic shapes, not 2026 ones.

The teams that get this right write the fallback's design doc first. They scope the AI feature as an enhancement on top of a path they're already willing to ship. The teams that get it wrong scope the AI feature as the path, and then panic-build the fallback the week before launch.

Failure isn't rare — it's a Tuesday

The argument against investing in the off-state usually sounds like "we'd be optimizing for an edge case." That argument was never true and is now demonstrably false. Major model providers post status pages that read like flight delay boards. ChatGPT had 61 incidents in the last 90 days at the time of writing — one major outage and 60 minor ones, with a median duration close to two hours. Claude went down worldwide on March 2, 2026, attributed to unprecedented demand triggered by a same-day feature launch. Grok has spent most of 2026 visibly scaling against demand it can't quite catch.

And that's just provider-side outages. The other failure modes are more frequent and less visible:

  • A model update silently changes refusal behavior, and your customer support assistant starts declining the most common ticket type.
  • A cost dashboard alarm fires Friday afternoon because a new tenant onboarded with a workload pattern your unit economics didn't anticipate.
  • A region-specific data residency regulation lands and you have 48 hours to disable the feature in three countries.
  • A red-team report shows the model leaks prior-conversation context under a specific prompt, and legal wants the surface dark by tomorrow.

Each of these is a "Tuesday" event for any team running AI in production at modest scale. Designing for them isn't paranoid; it's the baseline. The eight-week post-launch sequence of operational issues — cost spikes, hallucination escalations, eval drift, latency tail, a quiet provider TOS change — runs whether you've planned for it or not. The choice is whether you run it with a runbook or with a series of all-hands.

Granularity: per-capability, not per-feature

The most common architectural mistake in disable-switch design is feature-level granularity. The AI assistant feature has one flag. Flip it off and everything goes dark, including the parts that were fine. This is the kind of decision you make at sprint-planning convenience and pay for during an incident at 2 a.m.

The correct unit is the capability — the bounded thing the AI is doing inside a feature. A single product surface like an AI-assisted document editor might have ten of them: autocomplete suggestion, grammar correction, tone rewrite, table generation, image generation, summarization, translation, citation lookup, format conversion, and template selection. Each of these has its own:

  • failure mode (hallucination vs. cost vs. latency vs. policy violation),
  • dependency (which model, which retrieval pipeline, which tool),
  • fallback (a static template, a deterministic linter, a pre-built component, or simply "hide the button").

When the failure is "image generation is producing copyrighted material," you do not need to disable the entire editor. You disable the image generation capability and leave the other nine working. Per-capability flags also let you ramp back up surgically: if the model regression that affected tone rewrite is fixed but the retrieval pipeline for citation lookup is still degraded, you flip one back on without rolling the dice on the other.

This requires upfront work that feels excessive when you're shipping. It feels essential during the incident. The retrofit is painful — capability boundaries are a property of how the code is structured, not how the YAML is shaped.

Load shedding before user-facing timeouts

Even when nothing is "broken," the disable switch should fire automatically. Provider rate limits, queue saturation, and budget burn are continuous signals, not binary ones. The pattern that distinguishes a resilient AI feature from a fragile one is active load shedding — dropping AI calls before user-visible pages start timing out.

Concretely, this is a circuit breaker between the request handler and the model call, parameterized on three signals:

  • Provider error rate over a rolling window. When 429s and 5xxs cross a threshold (commonly 20% over 60 seconds), the breaker trips open and the next request gets the deterministic path. Closing happens after a probationary half-open window.
  • Queue depth or P95 latency on the model call. If the call is taking long enough that the user is going to bail, you may as well give them the deterministic path now and let them get on with it.
  • Cost-per-cohort budget burn. If the spend on a particular tenant or query class is on a trajectory to exceed its envelope by Tuesday, that capability should auto-disable for that cohort while still serving everyone else.

The mental model: model availability isn't binary, it's a continuous resource the system rations. The disable switch is the bottom of the rationing curve, not a separate circuit. When you build it that way, the worst-case incident is a smooth transition along a curve the team has already debugged in production at smaller scales — not a cliff the team falls off at full traffic.

Evaluate the disabled path as carefully as the enabled one

The single most important discipline that separates teams that survive AI incidents from teams that don't: they test the disabled path. Continuously. With the same eval rigor they apply to the enabled path.

Most teams have a model eval suite. Most teams do not have a fallback eval suite. So when the disable switch flips, nobody has run the deterministic path against real traffic in months. The static templates have stale strings, the rules engine doesn't understand the new schema, the "manual" form has fields that haven't been validated against the new product surface, the analytics events for the off-state were never instrumented, and the on-call engineer learns all of this by watching the support queue fill up.

The eval discipline that actually works:

  1. Synthetic traffic on the deterministic path, weekly. A small canary that runs the off-state for 1% of users — or a shadow run that compares both — catches drift before incidents do.
  2. Snapshot the off-state UX in design review. When the AI feature ships a new sub-capability, the disabled state for that sub-capability is part of the design hand-off, not an afterthought.
  3. Game-day drills. Schedule the disable switch. Once a quarter, flip it during business hours in a controlled window. The team learns what breaks; the runbook gets updated; the org muscle for "operating without the AI" stays alive.
  4. Telemetry parity. The disabled path emits the same observability signals as the enabled path. You can answer "how often did the off-state run, and was it healthy?" with a query, not an archeology project.

A team that runs all four can flip the switch on a bad day and trust the result. A team that skips them is hoping.

The org pattern that decides whether the off-state survives

The technical recommendations above are useless without the organizational discipline to maintain them. The recurring failure pattern is straightforward and almost universal: the PM and engineer who ship the AI version are promoted off the feature into the next AI initiative; the deterministic path silently transfers to a "platform" team that doesn't understand its product context; six months later the fallback is unowned, the eval suite is broken, and the disable switch fires into a void.

The mitigation isn't more process. It's a single explicit commitment in the product's lifecycle doc: the AI feature and its fallback have the same owner, for the same duration, and the fallback's health is a quarterly review item with the same standing as the AI's eval scores. If a team can't commit to that, they should not be shipping the AI version yet — they're shipping a single product, and the trench coat will fall open.

What to actually build first

If you're standing up an AI feature now, the order of operations that gets the off-state right:

  1. Write the fallback design doc before the AI design doc. What does this feature do without the model? Ship that first if you haven't already.
  2. Define capability boundaries and a flag per capability. Not per feature.
  3. Wire the circuit breaker on signals you actually have. Provider error rate, latency, cost burn — at least one, and add the others as the feature matures.
  4. Build the eval harness for the off-state. Synthetic traffic, telemetry parity, and a calendar entry for the next game-day drill.
  5. Name the owner. Singular. Not a Slack channel, a person.

The AI feature you ship is two products in a trench coat. The one users see during an incident is the one most teams treat as an afterthought, and it's the one that decides whether your AI feature survives its first bad month. Build the off-state with the same rigor as the on-state. The disable switch isn't infrastructure. It's the product, the day it matters most.

References:Let's stay in touch and Follow me for more thoughts and updates