Most production agents are background jobs cosplaying as chats. Here's why scheduled triggers, checkpointed state, and bounded envelopes outperform conversational loops on cost, reliability, and operability.
Provider model bumps carry no behavioral compatibility guarantee, so every version change should run through the same staged rollout as a database migration — pinned eval, shadow traffic, canary, and a real rollback path.
Putting 'I don't know' in the system prompt makes abstention untestable, unowned, and unscalable. Move it to the router and you get an SLO, an eval, and a real escalation path.
Agents inherited the broadest OAuth scopes the platform would issue, then drifted on a prompt — bringing back the privileged service account the security org spent a decade killing. A field guide to per-tool scoping, JIT credentials, action-level audit, and the IAM owner who owns the join.
Most production agents have a degraded-mode spec — it just lives in scattered catch blocks, untested, and the customer writes the public version of it on the next bad day.
Agent runtimes hide state in places your DR runbook never named. The fix: name the state surface, generate idempotency keys at task scope, checkpoint before every tool call, and default to fail-safe abort over fail-forward replay.
When an agent issues a wrong refund, your CRO will ask what produced it — and the answer requires a captured-at-write-time tuple of prompt, model id, decode config, tool results, and conversation history. Here is the discipline that makes 'we can reconstruct it' a true statement.
AI threat models usually stop at the model and treat output as safe content. Indirect prompt injection turns rendered markdown, structured output, generated code, and tool-call arguments into attack payloads — and the boundary worth defending is downstream of the model.
A permission prompt is a security control with a measurable half-life. Track per-user approval rate, tier friction by blast radius, and stop letting a 100% click-through rate carry your safety story.
Every agent release ships a bundle of system-prompt, model, tool, rubric, and retriever changes — and a file-diff changelog tells integrators nothing about the behavior shifts they will actually parse, budget against, or get paged on.
Request-level sampling policies break for agent traces. A per-tier policy — always-trace failures, head-sample successes, tail-sample by cost percentile — turns the trace store from a budget hole into an incident-response tool.
A four-line bug fix gets three rounds of code review. A forty-line system-prompt edit ships with a single LGTM. A field guide to closing the discipline gap on AI artifacts before it ships your next regression.