Blog

Page 152

12 articles

The Feedback Flywheel Stall: Why Most AI Products Stop Improving After Month Three
Most AI products hit a plateau around month three when the data flywheel quietly stalls. Three failure modes — diminishing data value, user-driven distribution shift, and annotation fatigue — explain why, and targeted interventions can restart the cycle.
insiderai-engineering
Apr 119 min
GraphRAG in Production: When Vector Search Fails at Multi-Hop Reasoning
Vector search fails when queries require connecting entities across documents. GraphRAG uses knowledge graphs to enable multi-hop reasoning — but the cost, entity resolution challenges, and maintenance burden demand careful architectural trade-offs.
graphragknowledge-graph
Apr 119 min
Human Feedback Latency: The 30-Day Gap Killing Your AI Improvement Loop
Explicit feedback rates top out at 1-3%, meaning most teams wait 30+ days before accumulating enough signal to detect quality changes. Here's the behavioral proxy architecture that gives you statistically valid signal on day 1.
ai-engineeringevaluation
Apr 1110 min
Hybrid Search in Production: Why BM25 Still Wins on the Queries That Matter
Pure dense retrieval fails silently on exact identifiers, code, and rare terms. Here's the score fusion architecture, reranking strategy, and diagnostic methodology that production RAG systems actually use.
retrievalrag
Apr 1111 min
LLM Content Moderation at Scale: Why It's Not Just Another Classifier
Content moderation at production scale requires a cascade of fast classifiers, LLM judgment, and human escalation — not a single model. Here's the architecture, adversarial failure modes, and the false-positive threshold that drives users away.
insiderai-engineering
Apr 1110 min
LLM Output as API Contract: Versioning Structured Responses for Downstream Consumers
When multiple services depend on LLM-structured output, model upgrades silently break downstream consumers. Here's how schema drift and behavioral drift happen, and the versioning and contract-testing patterns that catch breakage before deployment.
insiderllm
Apr 1110 min
LLM-Powered Test Generation: Using AI to Find Bugs in Your Software, Not Just Write It
How LLM-powered test generation catches bugs that hand-written suites miss — covering the oracle problem, mutation-guided approaches, hybrid architectures, and CI integration patterns that keep your build deterministic.
ai-testingllm
Apr 119 min
LLMs as Universal Protocol Translators: The Middleware Pattern Nobody Planned For
Teams are using LLMs as runtime protocol translators to bridge incompatible APIs and legacy formats. Here's the architecture that makes it safe, the failure modes that make it dangerous, and a decision framework for when it actually makes sense.
insiderllm
Apr 1111 min
Model Merging in Production: Weight Averaging Your Way to a Multi-Task Specialist
A technical deep dive into model merging techniques—weight averaging, SLERP, task arithmetic, TIES, and DARE—covering when merging beats ensembles, common failure modes, and how to deploy merged LLMs in production.
insidermachine-learning
Apr 1113 min
Multimodal RAG in Production: When You Need to Search Images, Audio, and Text Together
A practitioner's guide to multimodal RAG: embedding alignment across modalities, cross-modal reranking strategies, cost and latency tradeoffs, and the failure modes that only surface at production scale.
insiderrag
Apr 1112 min
The On-Call Burden Shift: How AI Features Break Your Incident Response Playbook
AI features introduce failure modes — silent degradation, provider-side changes, prompt injection — that traditional monitoring cannot detect. A practical guide to rebuilding on-call practices for non-deterministic systems.
insiderai-engineering
Apr 119 min
PII in LLM Pipelines: The Leaks You Don't Know About Until It's Too Late
How personal data silently leaks through prompt templates, context windows, observability tools, and RAG pipelines — and the engineering patterns that actually stop it.
ai-engineeringprivacy
Apr 1110 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 152

The Feedback Flywheel Stall: Why Most AI Products Stop Improving After Month Three

GraphRAG in Production: When Vector Search Fails at Multi-Hop Reasoning

Human Feedback Latency: The 30-Day Gap Killing Your AI Improvement Loop

Hybrid Search in Production: Why BM25 Still Wins on the Queries That Matter

LLM Content Moderation at Scale: Why It's Not Just Another Classifier

LLM Output as API Contract: Versioning Structured Responses for Downstream Consumers

LLM-Powered Test Generation: Using AI to Find Bugs in Your Software, Not Just Write It

LLMs as Universal Protocol Translators: The Middleware Pattern Nobody Planned For

Model Merging in Production: Weight Averaging Your Way to a Multi-Task Specialist

Multimodal RAG in Production: When You Need to Search Images, Audio, and Text Together

The On-Call Burden Shift: How AI Features Break Your Incident Response Playbook

PII in LLM Pipelines: The Leaks You Don't Know About Until It's Too Late

About Tian Pan