When your AI bill crosses seven figures, token quota stops being a finance number and starts behaving like an authorization surface. Why allocation needs IAM-style discipline, not dashboard sliders.
A vendor model bump can leave the API byte-stable while quietly swapping the tokenizer underneath — silently breaking context budgets, stop sequences, and few-shot prompts. Here is how to audit, pin, and survive tokenizer churn.
Binary tool approval breaks under load: a single confirm dialog cannot gate a draft save and an outbound payment without training users to click through both. A six-class risk taxonomy fixes the conflation.
Production tool usage follows a power law, but most agent frameworks treat the catalog as flat — and pay for it in token bloat, accuracy collapse past 100 tools, and silent long-tail regressions. A field guide to hot/cold partitioning.
Per-tool security review clears nodes, but agents run trajectories. The composition graph of an agent's tool catalog is a permission set the security team never enumerated, and confused-deputy exploits live on the edges.
AI agents stall at the autonomy ceiling — the level above which users start checking, intervening, or abandoning the feature. Treat it as a measurable product variable, not a model problem.
A single confidence threshold collapses two distinct decisions — abstain and escalate — into one number, and that compromise is why your trust metric keeps sliding even when accuracy looks fine.
When a user invokes their right to erasure, deleting the source text doesn't delete the embedding. Most teams never modeled the vector store as a third copy of user data — and the inversion-attack literature says they should have.
Behavioral portability across LLM providers decays the moment you stop funding it. A breakdown of the quarterly burn rate — eval subscriptions, prompt-as-function-of-model routing, contract leverage — that turns 'we can swap models' from a slide into a real option.
A vendor's 99.9% availability is measured per call; your agent makes 12 per task. The arithmetic, the missing contract clauses, and the divergence alarm that catches incidents before users do.
Why voice agents feel rude: a four-stage latency budget, hybrid turn detection, full-duplex audio, and a preemption contract that protects state.
An agent fans out 80,000 emails before breakfast and the password-reset domain reputation is gone for six weeks. The subdomain, DKIM, and rate-limit discipline you need before the first send.