Blog

Page 63

12 articles

Why AI Feature Flags Are Not Regular Feature Flags
Standard canary analysis breaks when you deploy AI models — error rates stay flat while quality silently degrades. Here's what to instrument instead, and how to build rollback triggers that actually work for probabilistic systems.
insiderai-engineering
Apr 1911 min
The AI Feature Lifecycle Decay Problem: How to Catch Degradation Before Users Do
91% of ML models degrade over time, but most teams only find out from user complaints. Here's how to instrument your AI features to catch distribution shift before it becomes a crisis.
mlopsmonitoring
Apr 1910 min
The AI Feature Sunset Playbook: How to Retire Underperforming AI Without Burning Trust
Teams are better at launching AI features than killing them. A framework for diagnosing when to retire vs. fix underperforming AI, overcoming sunk-cost bias, and deprecating gracefully.
aiengineering
Apr 1910 min
AI Incident Response Playbooks: Why Your On-Call Runbook Doesn't Work for LLMs
Conventional on-call runbooks break for AI systems because failures are non-deterministic, quality degradation has no error code, and root cause triage requires a fundamentally different framework. Here's what actually works.
insiderai
Apr 1910 min
AI Incident Retrospectives: When 'The Model Did It' Is the Root Cause
Classical 5-why analysis stalls when the failure is stochastic. Here's how to write useful post-mortems for AI incidents, what telemetry to capture at inference time, and how to build runbooks that go beyond 'monitor more carefully.'
insiderai-engineering
Apr 1910 min
The Alignment Tax: When Safety Features Make Your AI Product Worse
Safety guardrails and overly conservative refusals reduce user satisfaction on entirely benign queries. Here's how to measure your false-positive rate and calibrate thresholds for your actual deployment context.
safetyguardrails
Apr 199 min
Amortizing Context: Persistent Agent Memory vs. Long-Context Windows
Long-context models tempt you to dump everything in — but that costs 15x more and produces worse answers. Here's the decision framework for what to remember in external memory, what to re-fetch, and what to keep in-window, with compaction patterns that make memory-augmented agents cheaper and more accurate at scale.
insiderai-engineering
Apr 199 min
Behavioral Signals That Actually Measure User Satisfaction in AI Products
Thumbs up/down rates are noise. Here's the instrumentation schema for the implicit behavioral signals — retry rates, copy-without-edit events, downstream action completion — that actually predict whether users find your AI product valuable.
ai-engineeringproduct-metrics
Apr 199 min
Bias Monitoring Infrastructure for Production AI: Beyond the Pre-Launch Audit
Static fairness testing catches known problems against known datasets. Here's how to build the live monitoring infrastructure that catches the ones you didn't know to look for.
aimachine-learning
Apr 1910 min
Cache Invalidation for AI: Why Every Cache Layer Gets Harder When the Answer Can Change
Traditional TTL and tag-based cache invalidation breaks down in AI systems. A breakdown of each cache tier — semantic caches, RAG knowledge bases, prompt caches, and embedding indexes — the failure modes specific to each, and the design patterns that keep them consistent in production.
insiderai-engineering
Apr 1910 min
Canary Deploys for LLM Upgrades: Why Model Rollouts Break Differently Than Code Deployments
Swapping an LLM version isn't a code deploy. Output semantics shift, downstream parsers break on subtly different schemas, and by the time your monitoring fires, thousands of users have already absorbed the failure. Here's the engineering discipline that makes model upgrades predictable.
insiderllm
Apr 1911 min
The CAP Theorem for AI Agents: Choosing Consistency or Availability When Your LLM Is the Bottleneck
When an AI agent's tool call fails or the LLM times out, you face the same tradeoff distributed systems engineers know from the CAP theorem. Most agent frameworks silently choose availability — and pay for it in production.
ai-engineeringagents
Apr 1910 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 63

Why AI Feature Flags Are Not Regular Feature Flags

The AI Feature Lifecycle Decay Problem: How to Catch Degradation Before Users Do

The AI Feature Sunset Playbook: How to Retire Underperforming AI Without Burning Trust

AI Incident Response Playbooks: Why Your On-Call Runbook Doesn't Work for LLMs

AI Incident Retrospectives: When 'The Model Did It' Is the Root Cause

The Alignment Tax: When Safety Features Make Your AI Product Worse

Amortizing Context: Persistent Agent Memory vs. Long-Context Windows

Behavioral Signals That Actually Measure User Satisfaction in AI Products

Bias Monitoring Infrastructure for Production AI: Beyond the Pre-Launch Audit

Cache Invalidation for AI: Why Every Cache Layer Gets Harder When the Answer Can Change

Canary Deploys for LLM Upgrades: Why Model Rollouts Break Differently Than Code Deployments

The CAP Theorem for AI Agents: Choosing Consistency or Availability When Your LLM Is the Bottleneck

About Tian Pan