An internal endpoint stayed safe for a decade because nobody could find it. Then an agent indexed the wiki. Here is what changes when obscurity stops being a time tax.
When you put a human between an agent and an irreversible action, you have not added a safety primitive. You have added a queue with throughput limits, an availability profile, and a quality-versus-load curve. Here is how that becomes the P0 nobody scoped.
Between 200 and 429 lies a dead zone where every LLM client overshoots in lockstep. The missing load-pressure header is a protocol gap, not a client-side bug.
Coding agents reason against a snapshot of git state that goes stale silently. Worktrees, turn-preludes, branch pinning, and snapshot-and-restore turn that silent drift into a loud signal.
Computer-use agents that memorize CSS paths are signing up for silent failure. A look at selector decay, semantic anchors, vision-grounded fallback, and why a stored selector is a bet on a rendering decision the agent does not control.
Fixed-size token chunking cuts critical clauses in half, and neither fragment retrieves alone. A look at why the failure is invisible to standard evals and which chunking strategies actually close the gap.
Coding agents inherit your laptop's warm environment and ship diffs that pass locally then fail in CI. The fix is a repo-level environment contract, a pre-flight parity check, and grading on dev-vs-CI divergence.
Chat UIs let users edit and regenerate messages while backends quietly append every revision to a linear log — so the model answers the conversation the user thought they took back.
Overconfident AI agents do not lose renewals at the survey — they lose them six weeks later at the renewal call. Treating confidence display as a versioned product surface, not a stylistic accident.
Recorded sales demos become liability surfaces as model snapshots, prompts, tool catalogs, and retrieval substrates drift underneath them. A demo manifest plus a nightly eval suite turns recordings into testable behavioral commitments.
How few-shot retrieval and a shared CSV quietly turn a careful eval bank into the in-context examples you serve — and the storage-layer separation that stops it.
Every inference API returns a stop signal alongside the text. Ignoring it is the same shape of bug as ignoring HTTP status codes — and your dashboard cannot see the failures it causes.