Once users build workflows around an AI feature, removing it costs more than launching it did. Why kill switches go unused and how to design reversibility in from launch.
Pricing on AI features is an architecture input, not a finance afterthought. What to put in the PRD so engineering doesn't patch unit-economics leaks at midnight.
Multi-surface AI agents fragment memory across chat, email, SMS, and voice — leaving users with contradictory answers. A look at unified identity, write-through stores, and contextual privacy.
Frontier model latency follows a daily curve set by other people's traffic. Hour-of-day cohorting, batch routing, and load-aware failover turn a phantom regression into a scheduling problem.
Fuzzy PRD adjectives like 'helpful' and 'concise' don't survive contact with a model — the eval suite is where those decisions actually get made. Treat the eval as the spec, not as instrumentation.
The fallback you wrote nine months ago is silently broken. How AI graceful-degradation paths bit-rot, why integration tests miss it, and the failure-injection discipline that keeps degraded mode shippable.
Aligned LLMs quietly round unusual requests toward the training distribution mode. Here is why standard evals miss it and the off-mode discipline that catches it.
A 4,000-token system prompt nobody dares edit is not stability, it's debt. How prompts freeze, why iteration collapses around them, and the archaeology and eval discipline that thaws them.
Major LLM SDKs ship with two automatic retries by default. Stack a caller-side retry on top and a single request fans out to nine inference calls during a provider blip — invisible in your traces, visible only on the bill.
Customer-facing AI features absorb most of the budget, but the highest-leverage AI investment in your company is the internal Slack bot nobody is staffing. Here's the math, the failure pattern, and the discipline that captures the value.
Every 'do not' clause in a production system prompt is a patch on a behavioral mismatch. Track negative-prompt density, refactor each negative into a positive specification, and use the residue as a signal that prompt engineering is the wrong tool for the job.
MCP standardized how agents get tokens for tool servers, but left the harder question — how those servers thread user identity to downstream APIs — to implementers. Here is what holds up under audit.