Blog

Page 78

12 articles

Stateful Multi-Turn Conversation Infrastructure: Beyond Passing the Full History
Why 'pass the full conversation history' fails at p99 scale, and the session store designs, compression strategies, and operational patterns that actually hold up in production.
insiderai-engineering
Apr 1811 min
Structured Output Is Not Structured Thinking: The Semantic Validation Layer Most Teams Skip
JSON mode guarantees your LLM output matches a schema. It does not guarantee the output makes sense. The semantic validation layer catches contradictory fields, impossible date ranges, and domain constraint violations before they silently corrupt your data.
llmstructured-outputs
Apr 1811 min
What Structured Outputs Actually Cost You: The JSON Mode Quality Tax
Constrained decoding guarantees valid JSON but extracts a hidden quality cost. Here's how to measure the tax on your workload and decide when it's worth paying.
insiderllm
Apr 189 min
Synthetic Seed Data: Bootstrapping Fine-Tuning Before Your First Thousand Users
AI personalization and task-specific fine-tuning hit a cold-start wall when there's no behavioral data. Learn how to generate 500–1,000 high-quality synthetic examples and the failure modes that can silently poison your model.
insiderfine-tuning
Apr 189 min
The Quality Tax of Over-Specified System Prompts
Bloated system prompts don't just cost more — they make your model dumber. Here's how to measure prompt obesity and trim without regression.
prompt-engineeringllm
Apr 189 min
Your RAG Knows the Docs. It Doesn't Know What Your Engineers Know.
Most enterprise RAG systems only index written documents, missing the tacit knowledge that actually drives decisions. Here's how to build systems that capture what your engineers know before they walk out the door.
insiderrag
Apr 1810 min
Temperature Is a Product Decision, Not a Model Knob
LLM temperature controls output variance — and that variance directly shapes user trust, engagement, and behavior. Most teams treat it as a technical default. It isn't.
llmproduction-ai
Apr 189 min
Text-to-SQL at Scale: What Nobody Tells You Before Production
Text-to-SQL demos are easy; production deployments are not. Schema ambiguity, privilege escalation, and the 80% benchmark gap expose the engineering layer most teams skip.
ai-engineeringsql
Apr 1811 min
Adding AI to Systems You Don't Own: The Third-Party Model Integration Playbook
Building on external model APIs means rate limits, behavioral drift, and cost shocks are imposed on you. Here's the architecture that survives provider changes, outages, and silent model updates.
ai-engineeringllmops
Apr 1812 min
The Transcript Layer Lie: Why Your Multimodal Pipeline Hallucinates Downstream
Treating ASR and OCR output as ground-truth text silently poisons downstream LLM reasoning — and the fix isn't better models, it's keeping confidence scores alive through the pipeline.
ai-engineeringmultimodal
Apr 189 min
The User Adaptation Trap: Why Rolling Back an AI Model Can Break Things Twice
When a model update introduces subtly wrong behavior, users adapt their workflows around it. By the time you catch it and roll back, you may have two groups of broken users instead of one.
insiderai-engineering
Apr 189 min
The Vanishing Blame Problem in AI Incident Post-Mortems
When an AI system degrades, blame diffuses across model, prompt, retrieval, eval, and infrastructure simultaneously. Here is the attribution framework that pins incidents to a specific layer before your post-mortem devolves into 'the model just changed.'
ai-engineeringobservability
Apr 189 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 78

Stateful Multi-Turn Conversation Infrastructure: Beyond Passing the Full History

Structured Output Is Not Structured Thinking: The Semantic Validation Layer Most Teams Skip

What Structured Outputs Actually Cost You: The JSON Mode Quality Tax

Synthetic Seed Data: Bootstrapping Fine-Tuning Before Your First Thousand Users

The Quality Tax of Over-Specified System Prompts

Your RAG Knows the Docs. It Doesn't Know What Your Engineers Know.

Temperature Is a Product Decision, Not a Model Knob

Text-to-SQL at Scale: What Nobody Tells You Before Production

Adding AI to Systems You Don't Own: The Third-Party Model Integration Playbook

The Transcript Layer Lie: Why Your Multimodal Pipeline Hallucinates Downstream

The User Adaptation Trap: Why Rolling Back an AI Model Can Break Things Twice

The Vanishing Blame Problem in AI Incident Post-Mortems

About Tian Pan