Most teams pick where their system prompt lives by accident, then fight the consequences for years. The choice between code, config, and data storage cascades into deploy cadence, eval scope, and tenant flexibility — here is the framework to apply before MVP.
Prompt taste, eval taste, and guardrail taste are three separate intuitions that the AI engineer job title hides. Hire and promote as if they were one skill and you ship lopsided systems where every artifact is green and the user is leaving.
Flat-rate pricing for token-billed AI products produces a power-law usage distribution where a tiny minority of whales destroys margins. The standard fixes — caps, throttles, fair-use clauses — alienate the engaged users who would pay more if you let them. Here is the tier architecture, metering pre-work, and unit-economics discipline that actually fits how token costs behave.
Most prompt-injection threat models focus on data exfiltration. The quieter attack class is bill amplification — a $0.01 request becomes a $40 inference invoice. Here is the defense discipline that stops it.
When your AI bill crosses seven figures, token quota stops being a finance number and starts behaving like an authorization surface. Why allocation needs IAM-style discipline, not dashboard sliders.
A vendor model bump can leave the API byte-stable while quietly swapping the tokenizer underneath — silently breaking context budgets, stop sequences, and few-shot prompts. Here is how to audit, pin, and survive tokenizer churn.
Binary tool approval breaks under load: a single confirm dialog cannot gate a draft save and an outbound payment without training users to click through both. A six-class risk taxonomy fixes the conflation.
Production tool usage follows a power law, but most agent frameworks treat the catalog as flat — and pay for it in token bloat, accuracy collapse past 100 tools, and silent long-tail regressions. A field guide to hot/cold partitioning.
Per-tool security review clears nodes, but agents run trajectories. The composition graph of an agent's tool catalog is a permission set the security team never enumerated, and confused-deputy exploits live on the edges.
AI agents stall at the autonomy ceiling — the level above which users start checking, intervening, or abandoning the feature. Treat it as a measurable product variable, not a model problem.
A single confidence threshold collapses two distinct decisions — abstain and escalate — into one number, and that compromise is why your trust metric keeps sliding even when accuracy looks fine.
When a user invokes their right to erasure, deleting the source text doesn't delete the embedding. Most teams never modeled the vector store as a third copy of user data — and the inversion-attack literature says they should have.