Shipping one agent persona to a cohort-spanning customer base costs renewals quietly. The fix is overlays, not forks — and slice-level evals that catch the regressions an aggregate score will hide.
Deterministic PRDs have no field for what the model gets wrong, what eval score gates ship, or who owns the system prompt. Four sections that fix it.
Most agent teams discover the absence of a blast radius inventory during their first incident. Here's the artifact, the columns it needs, and how to make it a CI merge gate so it stays accurate.
Most teams profile the LLM call and declare victory. The real throughput killer is everything before the model sees a single token — parsing, chunking, embedding, and enrichment that quietly dominate end-to-end latency.
How AI engineers whose work is prevention — eval regressions caught, costs that didn't materialize, incidents that never happened — should write the promotion packet a calibration committee can actually score.
Vector similarity retrieval treats two versions of the same document as nearly identical, so any RAG system on evolving documents silently hallucinates 'nothing changed' on delta queries. Here is why the failure is structural and what an index that knows about change actually looks like.
A 45ms audio-video offset is the threshold where humans flag a talking-head AI as fake. Inside the real-time engineering — viseme schedules, audio master clocks, and the failure modes that never show up in offline eval.
Agents have no muscle memory for Reply All. When a send tool's recipient field accepts a distribution list indistinguishably from an individual, the planner picks the loudest door. Four practices to keep blast radius bounded.
Removing a function from your agent's tool catalog isn't a syntactic change — it's a behavioral migration. The staged retirement pattern that prevents fallback hallucinations and silent regressions.
Coupling prompt changes to your deploy pipeline is a self-imposed constraint. The runtime hot-reload pattern, its safety primitives, and the failure modes nobody plans for.
Blocking employees from personal ChatGPT or Claude accounts doesn't stop AI use — it makes it invisible. Here's how to survey shadow AI, build sanctioned channels, and avoid governance theater.
Shipping an AI feature multiplies your audit-log volume 10–50x. The SIEM renewal arrives later, with broken detection rules and a legal-hold problem nobody scoped.