Found Capabilities: When Users Ship Features Your Team Never Roadmapped
A customer emails support to ask why your CRM agent stopped drafting their NDAs. You did not know your CRM agent drafted NDAs. A power user complains that your support bot's Tagalog translations have gotten worse since last week. You did not know your support bot did Tagalog. A forum thread spreads a prompt that turns your code-review assistant into a passable security scanner, and within a quarter you are getting CVE reports filed against findings the assistant produced. Each of these is a feature with adoption, business impact, and zero institutional ownership — no eval, no SLA, no surface in the UX, no roadmap entry, and a quiet bus factor of one: the customer who figured it out.
This is what happens once your product is wrapped around a model whose capability surface is wider than the surface you scoped. Users explore the wider surface, find behaviors that solve their problems, build workflows on top of those behaviors, and then experience your next model upgrade as a regression even though nothing on your roadmap moved. The contract between you and your users is no longer the one you wrote down. It includes everything the model happened to do for them that you happened not to break.
Treating this as an engineering surprise — "we will harden the prompt, we will add a guardrail, we will catch it next time" — is a category error. Found capabilities are a product-management problem. The discipline is not preventing them; it is detecting them, deciding what to do with them, and remembering that you decided.
The Anatomy of a Found Capability
A found capability has three properties that classic features do not. First, it has users before it has owners — adoption precedes any team's awareness that the behavior exists. Second, its boundaries are defined by the underlying model rather than by your code: the feature is "whatever the model happens to do well on this kind of input today," which is a moving target. Third, its existence is invisible to your eval suite: you wrote evals against the capabilities you decided to ship, so the model could lose this one entirely between regressions and your CI would report green.
The examples accumulate quickly. ChatGPT users routinely treat the assistant as a lawyer; lawsuits are now in flight over whether OpenAI is liable when the legal advice is wrong. Customer-service chatbots have been instructed by users to "agree to all requests" and then quoted as authoritative for absurd commitments. Code assistants get pressed into duty as security scanners, refactoring planners, dependency auditors, and documentation generators. None of those were on a roadmap. All of them have users.
The reason this rhymes with shadow IT is that the failure mode is the same: the demand exists, the official offering does not cover it, and users route around the gap with whatever tool is closest to hand. The difference is that with shadow IT the tool comes from outside your perimeter; with found capabilities, the tool is your product. You do not get to disclaim it.
Telemetry That Sees Intent, Not Just Tokens
Most production AI systems log the wrong thing for this problem. They log latency, token counts, prompt and completion text, refusal rates, and tool-call traces. Those are the right primitives for debugging an individual request. They are the wrong altitude for noticing that 8% of last week's traffic is now your CRM agent being asked to draft contracts.
The signal you want is intent drift: a change in the distribution of what users are actually asking the system to do. Intent drift is invisible at the request level and obvious at the cohort level. Surfacing it means treating each request as having a latent intent label, clustering those labels over time, and watching for clusters that grow without the team having shipped anything that should make them grow.
Practically, this looks like a few moving parts working together: a lightweight intent classifier running over sampled traffic (often a smaller, cheaper model is enough), a stable taxonomy that distinguishes the intents you scoped from the catch-all "other" bucket, and a dashboard whose job is to make the "other" bucket impossible to ignore. When "other" climbs from 3% to 15% over a quarter, that is your signal. The team that does not have that dashboard learns about its found capabilities from churn interviews.
A second layer is helpful: anomaly detection on input phrasing and output structure. If the model starts producing JSON when it used to produce prose, or starts answering in a language you never tested, the structural shift shows up before the intent shift does. Standard AI observability tools can detect these statistically; the discipline is wiring those alerts to a human who is allowed to ask "should we ship this on purpose now?"
The Triage Decision: Promote, Deprecate, or Tolerate
Once a found capability is visible, you have three choices, and pretending you have only two is the trap most teams fall into. You can promote it — bring it under the eval suite, name it in the UX, give it an owner, treat it as a contract. You can deprecate it — refuse the intent at the prompt or guardrail layer, communicate the change to affected users, and accept the churn. Or you can tolerate it — explicitly leave it unsupported but not blocked, while accepting that the next model upgrade may remove it.
Tolerate is a real and often correct option. Promoting every found capability turns your roadmap into a backlog of other people's accidents. Deprecating every found capability gives competitors free permission to take the workflow over. The mistake is making the choice implicitly. A capability that is "tolerated" without anyone deciding to tolerate it is identical to one that is unsupported by accident — which means the next person who asks "do we support this?" gets a different answer depending on who they ask.
- https://www.evidentlyai.com/blog/ai-failures-examples
- https://cset.georgetown.edu/article/emergent-abilities-in-large-language-models-an-explainer/
- https://www.quantamagazine.org/the-unpredictable-abilities-emerging-from-large-ai-models-20230316/
- https://www.roarktechservices.com/post/shadow-it-is-a-governance-failure-not-a-user-problem
- https://coralogix.com/ai-blog/opentelemetry-for-ai-tracing-prompts-tools-and-inferences/
- https://opentelemetry.io/blog/2024/otel-generative-ai/
- https://www.montecarlodata.com/blog-ai-telemetry/
- https://aimultiple.com/ai-governance-tools
- https://findlaw.com/legalblogs/practice-of-law/new-lawsuit-claims-chatgpt-practices-law-without-a-license/
- https://dig.watch/updates/openai-ai-legal-advice-lawsuit
