Blog

Page 123

12 articles

Data Quality Gates for Agentic Write Paths: Garbage In, Irreversible Actions Out
Agents with write-access tools translate upstream data quality failures directly into real-world side effects. Here's the validation architecture that prevents them.
insiderai-engineering
Apr 1511 min
Debugging AI at 3am: Incident Response for LLM-Powered Systems
A 500 error has a stack trace. A bad generation has a probability distribution. Here's how to triage, debug, and post-mortem AI incidents before they wreck your week.
insiderobservability
Apr 1510 min
The Dependency Injection Pattern for AI Applications: Writing Code That Survives Model Swaps
Coupling business logic directly to OpenAI or Anthropic SDKs turns every model deprecation into a month-long refactor. Here's how to apply dependency injection to AI components so model swaps become configuration changes.
ai-engineeringarchitecture
Apr 159 min
Dependency Injection for AI: Mocking Model Calls Without Losing Test Fidelity
Mocking LLM calls in tests looks like a clean abstraction, but naïve stubs silently rot into lies about production behavior. A layered fixture architecture — stub fakes, recorded cassettes, live calls — plus deliberate seam design restores test fidelity without burning money on every commit.
insiderllm-testing
Apr 1510 min
Documenting Probabilistic Features: The Missing Layer Between Model Behavior and Developer Onboarding
AI-powered features have no stable input-output contract to document. Here's how to write API docs, changelogs, and runbooks for features that behave differently every time — using behavioral envelopes, versioning discipline, and observability as living documentation.
llmdocumentation
Apr 1510 min
The Embedding Drift Problem: How Your Semantic Search Silently Degrades
Embedding models freeze language at training time. As new terminology emerges, your semantic search quietly loses accuracy — no error fires, no alert triggers. Here's how to detect it and what to do.
embeddingsvector-search
Apr 159 min
The Eval Smell Catalog: Anti-Patterns That Make Your LLM Eval Suite Worse Than No Evals At All
A field guide to the anti-patterns that poison LLM eval suites — contamination, brittle assertions, eval rot, judge collusion, vanity aggregates — and the refactoring patterns that restore signal without rewriting the whole harness.
insiderai-engineering
Apr 1512 min
Building LLM Evals from Sparse Annotations: You Don't Need 10,000 Examples
Most teams delay eval investment waiting for enough labeled data. The evidence shows 50–200 carefully chosen examples, built with active learning, weak supervision, and LLM-bootstrapped labeling, produce reliable signal. Here's how to build trustworthy evals before you have a large dataset.
evaluationllm
Apr 1512 min
The Few-Shot Saturation Curve: Why Adding More Examples Eventually Hurts
Adding more few-shot examples to your prompts seems like a free win — it isn't. Here's the empirical evidence for where the curve turns against you, why it happens, and what to do instead.
insiderprompt-engineering
Apr 159 min
Fine-Tuning Dataset Provenance: The Audit Question You Can't Answer Six Months Later
Most fine-tuned production models have no reliable answer to 'where did this training example come from.' Here's the provenance registry schema and audit workflow that gives you one before the regulator asks.
insiderfine-tuning
Apr 1510 min
Graceful AI Feature Sunset: How to Deprecate a Model-Powered Feature Without Breaking User Trust
Deprecating an AI feature isn't like removing a button — users build workflows around model personality, output structure, and behavioral quirks. A four-phase lifecycle for retiring model-powered features without triggering churn.
ai-engineeringllm
Apr 1511 min
Grammar-Constrained Generation: The Output Reliability Technique Most Teams Skip
Constrained decoding guarantees schema-valid LLM outputs at the token level — eliminating the validate-retry loop entirely. Here's how it works, why most teams skip it, and when it actually hurts you.
llmstructured-outputs
Apr 1510 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 123

Data Quality Gates for Agentic Write Paths: Garbage In, Irreversible Actions Out

Debugging AI at 3am: Incident Response for LLM-Powered Systems

The Dependency Injection Pattern for AI Applications: Writing Code That Survives Model Swaps

Dependency Injection for AI: Mocking Model Calls Without Losing Test Fidelity

Documenting Probabilistic Features: The Missing Layer Between Model Behavior and Developer Onboarding

The Embedding Drift Problem: How Your Semantic Search Silently Degrades

The Eval Smell Catalog: Anti-Patterns That Make Your LLM Eval Suite Worse Than No Evals At All

Building LLM Evals from Sparse Annotations: You Don't Need 10,000 Examples

The Few-Shot Saturation Curve: Why Adding More Examples Eventually Hurts

Fine-Tuning Dataset Provenance: The Audit Question You Can't Answer Six Months Later

Graceful AI Feature Sunset: How to Deprecate a Model-Powered Feature Without Breaking User Trust

Grammar-Constrained Generation: The Output Reliability Technique Most Teams Skip

About Tian Pan