Blog

Page 98

12 articles

Pipeline Attribution in Compound AI Systems: Finding the Weakest Link Before It Finds You
When retrieval, reranking, generation, and validation compose into a single AI pipeline, degraded output quality is nearly impossible to blame on any single component. Here's the attribution methodology that actually works.
ai-engineeringcompound-ai
Apr 1910 min
The Precision-Recall Tradeoff Hiding Inside Your AI Safety Filter
Most teams ship AI safety classifiers with default thresholds and never measure the false-positive cost. Here's why that silently blocks legitimate users at scale—and the calibration practices that surface the tradeoff before it becomes a support crisis.
insiderai-safety
Apr 1910 min
Privacy-Preserving Inference in Practice: The Spectrum Between Cloud APIs and On-Prem
Navigating LLM privacy isn't a binary choice between cloud APIs and on-prem. Learn the four-layer spectrum of controls—PII redaction, sensitivity routing, differential privacy, and TEEs—with the real engineering cost and risk reduction each provides.
privacysecurity
Apr 199 min
The Production Distribution Gap: Why Your Internal Testers Can't Find the Bugs Users Do
Why AI systems pass internal testing but break in production — the systematic mismatch between dev/staging workloads and real user traffic, and the instrumentation patterns that close it.
insiderai-engineering
Apr 1911 min
Prompt Cache Hit Rate: The Production Metric Your Cost Dashboard Is Missing
Cache hit rate is the most impactful LLM cost lever most teams never monitor. Here's what silently destroys it and how to defend against it in production.
llmprompt-caching
Apr 1910 min
Your Prompt Is a Liability with No Type System
Every prompt you ship is mutable global state. Prompt regressions are invisible to CI, changes can't be rolled back atomically, and drift accumulates faster than documentation. Here's the versioning and governance architecture that treats prompts as first-class deployable artifacts.
prompt-engineeringproduction-ai
Apr 1910 min
Prompt Versioning Done Right: Treating LLM Instructions as Production Software
Most teams treat prompts like config files — until a three-word edit tanks a revenue-generating workflow. Here's the engineering discipline that prevents it.
llmprompt-engineering
Apr 198 min
Zero-Shot, Few-Shot, or Chain-of-Thought: A Production Decision Framework
Most teams pick prompting strategies by convention. Here are the evidence-based criteria—task complexity, model scale, token budget, output structure—that predict which approach wins on your specific task.
llmprompting
Apr 1910 min
RAG Knowledge Base Freshness: The Staleness Problem Teams Solve Last
Chunking and embedding quality dominate RAG architecture discussions, but index freshness silently determines your system's reliability over time. Here's how to detect, measure, and fix it.
insiderrag
Apr 1911 min
RAG Position Bias: Why Chunk Order Changes Your Answers
Retrieval correctness isn't enough — where your chunks appear in the prompt determines which ones the model actually uses. How position bias works in production RAG systems and what to do about it.
insiderrag
Apr 198 min
Testing the Retrieval-Generation Seam: The Integration Test Gap in RAG Systems
Unit tests for your retriever and generator can both pass while your RAG system silently fails. Here's how to test the seam between them and localize blame when it breaks.
insiderrag
Apr 1911 min
RBAC Is Not Enough for AI Agents: A Practical Authorization Model
Static role-based access control breaks when agents shift permissions mid-task. Here is how to build an authorization model that actually holds: narrow tool scopes, short-lived credentials, ABAC runtime policies, and audit trails anchored to agent identity.
insiderai-agents
Apr 1911 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 98

Pipeline Attribution in Compound AI Systems: Finding the Weakest Link Before It Finds You

The Precision-Recall Tradeoff Hiding Inside Your AI Safety Filter

Privacy-Preserving Inference in Practice: The Spectrum Between Cloud APIs and On-Prem

The Production Distribution Gap: Why Your Internal Testers Can't Find the Bugs Users Do

Prompt Cache Hit Rate: The Production Metric Your Cost Dashboard Is Missing

Your Prompt Is a Liability with No Type System

Prompt Versioning Done Right: Treating LLM Instructions as Production Software

Zero-Shot, Few-Shot, or Chain-of-Thought: A Production Decision Framework

RAG Knowledge Base Freshness: The Staleness Problem Teams Solve Last

RAG Position Bias: Why Chunk Order Changes Your Answers

Testing the Retrieval-Generation Seam: The Integration Test Gap in RAG Systems

RBAC Is Not Enough for AI Agents: A Practical Authorization Model

About Tian Pan