Borrowed credentials make agents look like the humans who launched them in every audit log — and that thin film is why a prompt injection in 2026 becomes an unattributable breach.
Agent workloads break smooth-curve capacity planning. Plan in tokens, treat fanout as a first-class metric, and reserve for cliffs you know will land.
Generated explanations next to LLM outputs often have no causal link to the actual computation. Why post-hoc rationalization erodes user trust faster than admitted uncertainty, and the design patterns that don't fake explainability.
Time-to-first-token and total completion time both pass SLO while users complain the AI 'froze' mid-response. The metric your dashboard hides is the gap between consecutive tokens — and smoothing it is a UX problem, not a throughput problem.
At fifty engineers, every team rebuilds the same LLM gateway badly. Why the pattern keeps emerging, what to centralize vs leave at the edge, and how the political fight gets settled.
Most agent products put the model in charge of planning and the user in charge of approving. For high-stakes work, that polarity is exactly backwards — and the fix is a different product, not a better prompt.
Every major LLM provider ships JSON mode under the same name and a different contract. The day your fallback router activates is the day you find out which differences your parser couldn't survive.
When the LLM grading your evals gets sharper, your scores drop on a system that didn't change. Here's how to tell judge drift from model regression — and stop debugging the wrong instrument.
Every LLM has a knowledge cutoff and every product silently lies about it. Treat freshness as a designed UX surface — not a footnote — or users will calibrate trust against an answer the model should have refused.
Vector indexes degrade gracefully but knowledge graphs fail discontinuously — running them behind one CDC pipeline ships silently wrong answers on multi-hop queries.
LLM-as-judge has length, position, and format biases that silently turn prompt iteration into a Goodhart machine. Three audits and a versioned judge fix it.
The SRE postmortem template was built for code changes and infrastructure faults. For LLM incidents, the variables that actually moved are missing — prompt revision, model selection slice, judge config, retrieval index state, tool schemas, traffic mix. Here is the template fields and incident-class taxonomy that close the gap.