Fixed-size token chunking cuts critical clauses in half, and neither fragment retrieves alone. A look at why the failure is invisible to standard evals and which chunking strategies actually close the gap.
Coding agents inherit your laptop's warm environment and ship diffs that pass locally then fail in CI. The fix is a repo-level environment contract, a pre-flight parity check, and grading on dev-vs-CI divergence.
Chat UIs let users edit and regenerate messages while backends quietly append every revision to a linear log — so the model answers the conversation the user thought they took back.
Overconfident AI agents do not lose renewals at the survey — they lose them six weeks later at the renewal call. Treating confidence display as a versioned product surface, not a stylistic accident.
Recorded sales demos become liability surfaces as model snapshots, prompts, tool catalogs, and retrieval substrates drift underneath them. A demo manifest plus a nightly eval suite turns recordings into testable behavioral commitments.
How few-shot retrieval and a shared CSV quietly turn a careful eval bank into the in-context examples you serve — and the storage-layer separation that stops it.
Every inference API returns a stop signal alongside the text. Ignoring it is the same shape of bug as ignoring HTTP status codes — and your dashboard cannot see the failures it causes.
Trial caps sized for human attention break the moment a programmatic signup points an agent at the endpoint. A field guide to redesigning quotas, detection, and exhaustion UX for the population actually signing up.
When coding agents write faster than your dev server's watcher debounce, the HMR overlay becomes a self-exciting oscillator that the agent reads back into its own reasoning.
Coding agents ship code that compiles, tests pass, and reviewers approve — but uses idioms your codebase does not speak. Here is why, and what to do about the drift.
Coding agents accelerate diffs and decelerate cognition. The hidden cost is the engineer's mental model — and the practices that keep it alive.
Retiring a deterministic feature in favor of an agent silently hands the predecessor's SLO to a system that cannot meet it — and the gap shows up the morning your inference provider throttles.