AI codebases carry a hidden domain-knowledge tax that turns three-week ramps into three-month ones. The fix is decision history, not architecture diagrams.
Hiding AI cost from users ships silent throttling and surprise downgrades. Treating the token budget as a real product surface — preview, caps, model selection — turns the cost ceiling from a churn driver into a monetization lever.
Six months into a human eval program, the inter-rater agreement number is a weighted average of three different implicit rubrics. The model didn't drift — the measurement instrument did.
Agent stacks emit four logs that don't agree. The fix isn't more logging — it's a transaction ID minted at the user-action boundary, a unified audit record, and retention sized to the compliance question, not the subsystem.
Production AI products leak refusals to three-word framings — 'hypothetically,' 'for educational purposes,' 'for a story.' How to detect and defend against the bypass vocabulary your users pick up from social platforms.
Compliance reviewers spot LLM failure modes engineering evals systematically miss. Move them out of document-review gates and into the regression suite — legal sign-off becomes a statement about pinned test cases that run every commit.
Long-running agent sessions silently leak tokens — quadratic cost growth and quality degradation hide inside conversation history. How to instrument, prune, and compact it.
Chat is a great input modality and a terrible output one. The moment your agent returns more than three results, the right answer is to render UI — not to keep talking.
Once users build workflows around an AI feature, removing it costs more than launching it did. Why kill switches go unused and how to design reversibility in from launch.
Pricing on AI features is an architecture input, not a finance afterthought. What to put in the PRD so engineering doesn't patch unit-economics leaks at midnight.
Multi-surface AI agents fragment memory across chat, email, SMS, and voice — leaving users with contradictory answers. A look at unified identity, write-through stores, and contextual privacy.
Frontier model latency follows a daily curve set by other people's traffic. Hour-of-day cohorting, batch routing, and load-aware failover turn a phantom regression into a scheduling problem.