Swap a 200ms search call for a 4-second agent loop and the latency budget does not vanish — it migrates from infrastructure to UX, and the team that does not catch the handoff ships a worse product with a better metric.
Coding agents confidently emit code that compiles against the wrong version of your dependencies. The model isn't hallucinating — it's remembering a library that no longer exists.
Multi-agent workflows with human approval queues recreate every classical deadlock condition. The cycle hides across two queues and two calendars until a customer notices.
When an agent posts a polite summary in the incident channel and the commander reads it as ownership, the escalation chain has quietly acquired a transition nobody wrote down. Patterns to close the gap.
Most incident templates have no row for what an AI agent inferred — so the action items chase a deterministic fix for a probabilistic failure, and the same outage class keeps recurring.
When an agent's prompt becomes the only authoritative description of a business process, you have shipped a runbook in the cloud that nobody can audit, version, or hand off.
Per-user context at the top of the system prompt is the silent way to triple your inference bill. Here is why the cliff is invisible until billing close, and how to defend the cache boundary in code, in review, and in CI.
Adding a secondary LLM provider does not make your system redundant. It makes it twice as expensive to maintain — and brittle if you skip the prompt-engineering work that follows.
Rate limits calibrated for human-paced traffic collapse the first time an agent points a planning loop at the endpoint. Treat the limit as a split contract — throughput budget plus abuse ceiling — keyed on tenant and workload class.
When an agent spins up a recurring task and walks away, the schedule outlives its owner — and the orphan rate compounds quietly until somebody runs the audit.
Streaming LLMs surface partial reasoning users read as final answers. Why contradictions inside a single response break UX and evals, and four patterns that put a commit boundary back in.
A rising task-completion rate after a model upgrade can mean the agent got better — or that it stopped attempting the hard cases. Decompose your success metric or watch churn pay the bill.