The AI Capability Ratchet: How One Smart Feature Breaks Your Entire Product
Your AI-powered search just shipped. It's fast, conversational, and handles nuanced queries in ways your old keyword search never could. The feature review was glowing. The launch post got shared. And then, two weeks later, the support tickets start — not about search, but about the customer support widget, the help documentation, and the notification center. Nobody changed any of those things. But users are suddenly furious.
Welcome to the AI capability ratchet. The moment you ship one demonstrably intelligent feature, you have permanently recalibrated what users consider acceptable across your entire product. The ratchet clicks up. It does not click back down.
This pattern is one of the least-discussed failure modes in AI product development. Teams celebrate individual feature launches without accounting for the expectation debt they are distributing to every team that didn't ship anything.
The Reference Point Problem
The behavioral economics term for what's happening is reference point shift. Daniel Kahneman's prospect theory establishes that people evaluate outcomes relative to a reference point, not in absolute terms. Software users are no different. They don't experience features in isolation — they experience them against an internal model of what "good software" feels like.
When that model updates, everything gets re-graded.
Before your AI search launched, users evaluated the help documentation by comparing it to help documentation they'd seen elsewhere — maybe decent, maybe frustrating, but graded on that curve. After your AI search launched, their reference point for "the product being smart" updated. Now the help documentation gets graded on the new curve. It didn't get worse. The curve moved past it.
The adaptation level theory, developed by Harry Helson, explains why this is one-way: humans adapt to new stimuli and treat their adapted level as neutral. Once users have experienced AI-quality interaction anywhere in your product, that sensation becomes the baseline. Anything below it registers not as average, but as actively broken.
This is the ratchet. You can only tighten it. You cannot loosen it by reverting the feature that caused the shift — that would just add "you took away something good" to the list of complaints.
What This Looks Like in Production
The most documented example of the ratchet in action is Google Photos. When Google introduced Ask Photos — a Gemini-powered natural language search — it worked well enough on some queries and poorly on others. But the consequential finding wasn't the AI search's accuracy. It was that after users experienced Ask Photos, classic search also felt inadequate, even on queries where classic search had been good enough for years. Two features that independently might have scored acceptable reviews were both perceived as broken because the AI experience had moved the reference point.
By March 2026, Google had added a visible toggle to let users switch back to classic search — an acknowledgment that the AI feature had contaminated the classic feature's UX without improving it.
Microsoft ran the same experiment at OS scale. Embedding Copilot buttons across Notepad, Paint, File Explorer, and the Windows notification center created expectations of intelligent behavior across the entire OS. When standard Windows UI couldn't match those expectations, the non-AI features felt more broken than before any AI was added. Microsoft eventually began removing Copilot integrations from in-box apps, an admission that the capability ratchet had moved faster than the product could follow.
The customer service version of this pattern is particularly damaging. Klarna deployed an AI assistant that reduced issue resolution from eleven minutes to two minutes. The speed gain worked. But it created an expectation transfer problem: customers then expected human agents to resolve complex issues at similar speeds. When the AI failed on hard queries and escalated to humans, the handoff felt catastrophic — not because human resolution time had changed, but because users were now timing it against a two-minute baseline that hadn't previously existed.
The Jagged Frontier Makes It Worse
AI capability is uneven across task types. A model that handles nuanced reasoning can still fail at tasks humans find trivial. Researchers call this the jagged frontier — a capability boundary that bulges outward in surprising directions and falls short in others.
Users don't know where the frontier is. They observe AI succeeding at something hard and conclude it should be able to do something easy. When you ship an AI feature that excels at complex analysis, users assume the product is uniformly capable. The features that fall below the frontier — including features that were never designed to be intelligent — now appear broken by comparison, even when the hard reasoning success and the easy failure coexist in the same model.
This creates a specific kind of organizational trap: your AI feature's impressiveness becomes a liability for adjacent features that are merely adequate. The smarter the AI feature looks, the wider the gap it creates.
The Halo Effect, Running Backwards
Nielsen Norman Group's research on halo effects in product UX documents this precisely. When users develop a strong impression of one part of a product, they transfer it to adjacent features. But when a high-performing feature creates a favorable impression that adjacent features then violate, the halo runs in reverse. Users who expected the product to be smart start consciously cataloging every place where it isn't.
One NNGroup example: poor search results cause users to conclude that the entire company doesn't have its act together. Replace "poor search results" with "search results that don't match the AI-quality experience I just had in another part of this product" and you have a good description of what the capability ratchet produces at scale.
Importantly, the reverse halo tends to amplify over time. Users who've been primed by one excellent AI feature don't quietly accept the gap — they seek out confirmation of it. The expectation debt compounds.
Three Responses, and When Each One Works
When a team realizes its AI feature has moved the expectation baseline for the product, there are three defensible strategies.
Upgrade everywhere means treating AI parity across the product as a priority constraint. Shopify took this position explicitly: CEO Tobias Lütke declared AI usage non-optional company-wide in 2025, requiring teams to prove they'd exhausted AI approaches before requesting additional headcount. The logic is that once the reference point has moved, only moving the whole product forward prevents the gap from widening. The risk is that not all product surfaces can meaningfully benefit from AI at the same maturity level. Teams will ship AI for AI's sake, creating new capability ratchet problems in surfaces where the AI is worse than the baseline it replaced.
Contain and flag means making capability boundaries explicit and user-controllable. The Google Photos toggle is the clearest example. When users can choose between AI and classic behavior via a visible control, they're managing their own reference points rather than having the product silently move the baseline on them. This works best when the AI feature replaces existing behavior rather than adding to it. The failure mode is when the toggle is buried — if users can't easily find the escape hatch, the expectation debt still accumulates, they just don't know where to go when they want the old behavior.
Communicate expectations means being explicit about what each surface is and isn't designed to do, both to users and internally to teams that will absorb the expectation overflow. Only about a third of organizations actively communicate AI's limitations to users — the majority just launch. The practical implication is that support teams inherit a wave of tickets about features they didn't change, from users with a new reference point nobody warned them they'd be applying to other surfaces.
None of these strategies eliminates the ratchet. They manage it. The ratchet clicks whether you prepared for it or not.
The Organizational Debt Nobody Tracks
The deepest problem with the capability ratchet is where the expectation debt lands. The team that ships the AI feature gets the launch celebration. The teams that inherit the expectation overflow — the team owning the help documentation, the team owning the notification center, the team owning the onboarding flow — get the support tickets and the churn signals.
This isn't an edge case. It's structural. When one feature moves the reference point product-wide, every team that didn't ship anything is now working against a harder benchmark with the same resources they had before.
Deloitte's research on AI team structure found that cross-functional teams are 30% more likely to report significant efficiency gains from AI — not because the technology works better in cross-functional settings, but because siloed teams systematically underdeliver on expectations that span functions. The expectation ratchet is a coordination problem as much as a product problem.
The concrete symptom is that complaints about non-AI features start spiking after an AI feature launches. If your support ticket volume for the help docs went up two weeks after you launched AI search, the two events are probably connected. Without an explicit tracking mechanism — a way to attribute expectation-driven tickets to the feature that moved the baseline — teams will spend months optimizing for problems they didn't cause and can't solve by improving the feature they own.
What Actually Helps
A few practices reduce the organizational damage without preventing the capability gains.
First, before launching an AI feature, explicitly map the features users will compare it against. Ask: if a user experiences this and then immediately uses [adjacent feature], what gap will they perceive? That gap is going to show up in your support queue whether you plan for it or not.
Second, build escape hatches into AI features that replace existing behavior. The toggle isn't a failure — it's the product acknowledging that it moved a reference point users may not have consented to. A visible toggle also gives you a signal: if 40% of users switch back to classic behavior, you know the AI feature hasn't actually cleared the expectation bar, even if your launch metrics looked good.
Third, track expectation-driven tickets as a separate category. When a support ticket is fundamentally "why doesn't X work as well as Y," that's a different problem than "X is broken." The former needs a product response. The latter needs an engineering response. Conflating them means neither gets solved.
Fourth, when a feature ships that will move the baseline, brief every team that owns a surface users might compare it against. This sounds like obvious communication. It almost never happens. Product launches are coordinated within the team that owns the feature, not with teams that own adjacent features that users will now evaluate on a new curve.
The Ratchet Is Not Inherently Bad
The expectation ratchet isn't a reason to avoid shipping good AI features. Users holding products to higher standards is, in aggregate, the right direction for software to move. The ratchet is a problem when teams treat each feature as an independent launch without accounting for the expectation externalities it creates across the product.
The mental model shift that actually helps: think of each AI feature you ship as a contract with the user about the quality level of the whole product, not just the feature itself. That contract will be enforced by users whether you intended it or not. The teams that plan for it create better products. The teams that don't create support queues, churn, and a trail of demoralized engineers who can't figure out why users hate features nobody changed.
The ratchet clicks. Plan for where it clicks to.
- https://www.reforge.com/blog/the-expectation-reset
- https://www.reforge.com/blog/product-market-fit-collapse
- https://www.nngroup.com/articles/halo-effect/
- https://www.nngroup.com/articles/ai-changing-search-behaviors/
- https://gigacatalyst.com/blog/ai-features-making-products-worse
- https://techcrunch.com/2026/03/10/google-gives-in-to-users-complaints-over-ai-powered-ask-photos-search-feature/
- https://www.nber.org/papers/w31161
- https://www.figma.com/blog/figma-2025-ai-report-perspectives/
- https://mitsloan.mit.edu/ideas-made-to-matter/working-definitions/what-is-jagged-ai-frontier
- https://thedecisionlab.com/reference-guide/economics/reference-point
- https://www.businesswire.com/news/home/20250520435394/en/Cell-Phone-Satisfaction-Hits-Decade-Low-Mark-as-AI-Promises-Fall-Short-ACSI-Data-Show
- https://www.gartner.com/en/newsroom/press-releases/2024-07-09-gartner-survey-finds-64-percent-of-customers-would-prefer-that-companies-didnt-use-ai-for-customer-service
- https://www.zendesk.com/newsroom/articles/2025-cx-trends-report/
- https://jakobnielsenphd.substack.com/p/ai-use
- https://www.cnbc.com/2026/04/01/ai-chatbot-customer-service-complaints-refunds.html
