User-percentage feature flags spread the hard 5% of queries evenly across cohorts, hiding tail regressions until 100%. Ramp by difficulty, token length, query slice, or tool-call depth instead — that is the axis where AI blast radius actually lives.
Production AI features cluster around one engineer's calendar, and the resulting bottleneck stays invisible to every dashboard until the expert quits. Here is how to detect and dismantle it.
Per-seat unlimited AI tiers are a naked short on token volatility. Vendor repricing, power-user drift, and model-mix creep compress margins overnight — unless attribution, caps, and tier gradients are built in before the page goes live.
Your enterprise risk register has rows for cyber, vendor, regulatory — but no row for the autonomous agent that just took an action under your credentials and produced a customer-visible loss. Here are the five columns the CRO will ask for the next morning.
Shadow LLM proxies bypass cost attribution, audit logs, and DPAs because the platform gateway loses to product deadlines. The fix is a paved road that beats the side-channel on time-to-first-token, capability parity, and developer experience.
When the model invents an argument value, the cheapest hypothesis isn't 'the model failed' — it's 'the description you gave the model no longer matches the API on the other side of the wire.'
Static bias audits pass in CI and fail in production because input distributions drift. Continuous fairness monitoring with per-cohort SLOs and drift-aware release gates is the fix.
When every quality regression on your team gets routed to 'let's try the bigger tier,' you're paying capacity to mask an upstream bug. The discipline to break the reflex, and the gate to put in front of it.
Browser-native AI is not a faster TensorFlow.js. It is a different runtime with a four-axis trade-off — latency floor, privacy, device fragmentation, capability ceiling — that does not collapse into a single answer.
A 0.87 confidence badge changes no user behavior. A natural-language hedge that names what the model didn't check changes a lot. Why probability scores are the wrong shape of signal, and how to ship uncertainty as content instead of a UI overlay.
Token spend is the numerator. Eval-graded outcomes are the denominator. Tracking only the bill is how cheap-tier migrations silently regress quality and inflate downstream support cost.
When agents call agents across team boundaries, individual SLOs stop predicting end-to-end behavior. The four pieces that have to land before the composition math eats your reliability budget.