Blog

Page 105

12 articles

Foundation Model Vendor Strategy: What Enterprise SLAs Actually Guarantee
Enterprise teams pick LLM vendors based on benchmarks and demos. Then they hit production and discover what the SLA actually says — which is usually much less than they assumed.
insiderai-engineering
Apr 1812 min
The Evaluation Paradox: How Goodhart's Law Breaks AI Benchmarks
When AI teams optimize for benchmark scores instead of real capabilities, scores climb while quality degrades. Here's how the evaluation paradox works and what structural changes actually make evals resistant to gaming.
insiderai
Apr 1810 min
GraphRAG vs. Vector RAG: The Architecture Decision Teams Make Too Late
Vector RAG hits a mathematical ceiling on relational queries — the migration path from pure vector to hybrid graph-vector retrieval, and the query patterns that reveal you've outgrown dense-only search.
RAGGraphRAG
Apr 1812 min
Hallucination Is Not a Root Cause: A Debugging Methodology for AI in Production
Moving beyond 'the model hallucinated' to systematic root cause analysis: retrieval failure, conflicting context, prompt ambiguity, and knowledge boundary violations each require different fixes.
insiderllm
Apr 1810 min
Why Hallucination Rate Is the Wrong Primary Metric for Production LLM Systems
Hallucination rate is easy to measure but weakly correlated with user outcomes. A framework for choosing behavioral metrics that actually reflect whether your AI feature is working.
evaluationobservability
Apr 188 min
The Idempotency Problem in Agentic Tool Calling
Why agent retry logic causes duplicate charges, double-sent emails, and inconsistent state — and how saga patterns, idempotency keys, and structured error signals fix the problem at the architecture level.
insiderai-engineering
Apr 1811 min
The Inference Optimization Trap: Why Making One Model Faster Can Slow Down Your System
Swapping a model component for a faster version often increases end-to-end latency and cost. Here's why—and the profiling discipline that prevents it.
insiderai-engineering
Apr 189 min
What Your Inference Provider Is Hiding From You: KV Cache, Batching, and the Latency Floor
The decisions made inside LLM inference infrastructure—KV cache eviction, continuous batching, chunked prefill—set your application's performance envelope before you write a line of code. Here's what's actually happening and the few knobs you control.
llminference
Apr 1811 min
Invisible Model Drift: How Silent Provider Updates Break Production AI
LLM providers update models without changelogs. Your prompt regressions are real, they're silent, and they're your problem to detect. Here's how.
insiderllm
Apr 1810 min
Knowledge Distillation for Production: Teaching Small Models to Do Big Model Tasks
How to use frontier model outputs as supervision signal to build task-specific small models—covering the dataset curation pipeline, quality collapse detection, and the benchmarking methodology that tells you when the distilled model is ready for production.
ai-engineeringllms
Apr 189 min
Knowledge Distillation Without Fine-Tuning: Extracting Frontier Model Capabilities Into Cheaper Inference Paths
A practical decision framework for AI engineers on when distilling frontier model capabilities into smaller student models actually pays off—and when it silently fails on out-of-distribution inputs.
ai-engineeringllm
Apr 1810 min
The Latent Capability Ceiling: When a Bigger Model Won't Fix Your Problem
Frontier models plateau on domain-specific tasks well before teams expect it. Here's how to diagnose whether you've hit a true capability ceiling or a prompt, eval, or data problem — and which technique actually breaks through.
llmfine-tuning
Apr 1810 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 105

Foundation Model Vendor Strategy: What Enterprise SLAs Actually Guarantee

The Evaluation Paradox: How Goodhart's Law Breaks AI Benchmarks

GraphRAG vs. Vector RAG: The Architecture Decision Teams Make Too Late

Hallucination Is Not a Root Cause: A Debugging Methodology for AI in Production

Why Hallucination Rate Is the Wrong Primary Metric for Production LLM Systems

The Idempotency Problem in Agentic Tool Calling

The Inference Optimization Trap: Why Making One Model Faster Can Slow Down Your System

What Your Inference Provider Is Hiding From You: KV Cache, Batching, and the Latency Floor

Invisible Model Drift: How Silent Provider Updates Break Production AI

Knowledge Distillation for Production: Teaching Small Models to Do Big Model Tasks

Knowledge Distillation Without Fine-Tuning: Extracting Frontier Model Capabilities Into Cheaper Inference Paths

The Latent Capability Ceiling: When a Bigger Model Won't Fix Your Problem

About Tian Pan