Sunsetting an AI feature is not like sunsetting an API. The contract is the model's observed behavior, and users build invisible scaffolding on top of it that breaks on cutover.
Quarterly OKRs were calibrated for deterministic software. AI features have launch curves and sustain curves, and the template that treats them as deliverables produces demos that decay between planning cycles.
Every production AI feature has four artifact owners and zero owners for the integrated user experience. That gap is where seam bugs live — and the org-design fix that closes it.
Most demos work. A meaningful fraction of shipped AI features are still task-shape mismatched — stochastic engines wired into deterministic-required outputs. A pre-build checklist and the roadmap pathway you need to redirect ideas that are not model-shaped.
Standard engineering interview loops select for deterministic-systems skills and miss the cluster — eval design, cost intuition, prompt debugging, recovery-mindedness — that predicts who ships LLM products. The fix is loop redesign, not another bolted-on AI round.
Pages that say the model started lying do not fit a runbook designed for restart the service. Here is the five-surface triage tree, freeze button, and replay harness that make AI on-call its own discipline.
Provider batch APIs cut inference cost in half but reshape the engineering contract: job-level idempotency, freshness boundaries, late-result observability, and a tier-aware decision matrix that reroutes 30–50% of LLM spend on workloads the user was never waiting on.
A hosted moderation API turns your safety control into a synchronous external dependency — the build-vs-buy decision, fail-open vs fail-closed tradeoff, and integration discipline that keeps a vendor on the safety-critical path from owning your incident response.
Every default in the LLM stack — pretraining, RLHF, judge LLMs, user feedback — pushes the model toward confident wrong answers. Calibrated abstention only ships if you build the eval, rubric, and UI that pay for it.
Stop is a UI affordance, not a system guarantee. A practitioner's playbook for cancel-safe agents: durable side-effect ledgers, scoped authorization, compensating actions, and what the cancel UI should actually display.
A field guide to attributing cost across compound AI systems — per-span ledgers, on-behalf-of headers, settlement-currency mismatches, and the political surface that decides who pays for tool calls.
Treating conversation history as scrollback is why agents lose the thread by turn 8 and why context bills scale superlinearly. The fix is to call it what it is — a read-heavy database — and design accordingly.