Blog

Page 111

12 articles

Knowledge Graph vs. Vector Store: Choosing Your Retrieval Primitive
Vector similarity and graph traversal answer different questions. Learn when vector stores fail on multi-hop reasoning, when knowledge graphs win on structured queries, and how to build hybrid retrieval that handles both.
insiderrag
Apr 179 min
The LLM Local Development Loop: Fast Iteration Without Burning Your API Budget
How to build a fast inner loop for LLM applications using record-replay patterns, deterministic fixtures, and a layered test strategy — without burning your API budget on every code change.
insiderllm
Apr 1710 min
The LLM Pipeline Monolith vs. Chain Trade-off: When Task Decomposition Helps and When It Hurts
Most teams default to chaining LLM calls without measuring whether it beats a single large-context call. Here's what the empirical evidence actually says about when to chain and when to go monolith.
llmengineering
Apr 178 min
Model Deprecation Readiness: Auditing Your Behavioral Dependency Before the 90-Day Countdown
When a model gets deprecated, the hard part isn't updating the API call — it's discovering all the invisible behavioral contracts your system assumed. Here's how to audit them before the clock runs out.
insiderllm
Apr 178 min
Model Routing in Production: When the Router Costs More Than It Saves
Most teams deploy model routers expecting automatic cost savings. The counterintuitive reality: a poorly designed router can cost more than sending every request to the expensive model. Here's the decision framework that actually works.
insiderllm
Apr 1710 min
How to Pick the Right LLM Before You Write a Single Prompt
Public benchmarks have saturated and can't tell you which LLM will work in your system. A practical framework for evaluating models on the dimensions that actually matter: function-call reliability, structured output compliance, refusal rate on your domain, and latency under real concurrency.
llmmodel-selection
Apr 1710 min
Preference Data on a Budget: Capturing RLHF Signal Without a Research Team
How to collect pairwise preference signal from real users using implicit behavioral telemetry, inline editing, and A/B prompts — plus the minimum viable reward model setup that works without PPO infrastructure.
rlhffine-tuning
Apr 1711 min
Prompt Injection at Scale: Defending Agentic Pipelines Against Hostile Content
Prompt injection is the #1 vulnerability in production AI agents. Here's the attack surface, why instruction-level defenses fail, and the architecture that keeps systems useful under adversarial pressure.
securityai-agents
Apr 1710 min
Prompt Regression Tests That Actually Block PRs
Most teams claim to test their prompts. Almost none have CI gates that will fail a build. Here's the lightweight harness that changes that without burning your API budget.
insiderai-engineering
Apr 1710 min
Retrieval Debt: Why Your RAG Pipeline Degrades Silently Over Time
Your RAG pipeline was working fine at launch. Now answers feel slightly off and nobody can explain why. Here's how retrieval debt accumulates through stale embeddings, tombstoned chunks, and encoder drift — and how to stop it before users notice.
insiderrag
Apr 1710 min
Sampling Parameters in Production: The Tuning Decisions Nobody Explains
Temperature, top-p, and top-k silently shape your LLM's output quality. Here's what engineers actually need to know about tuning them in production—including why temperature=0 isn't deterministic and how top-p and temperature interact.
llmproduction
Apr 1711 min
Structured Outputs Are Not a Solved Problem: JSON Mode Failure Modes in Production
JSON mode feels like a solved problem until you hit deeply nested schemas, enum-heavy types, or long completions that truncate silently. A complete failure taxonomy and the validation patterns that catch breakage before it reaches users.
llmstructured-outputs
Apr 1712 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 111

Knowledge Graph vs. Vector Store: Choosing Your Retrieval Primitive

The LLM Local Development Loop: Fast Iteration Without Burning Your API Budget

The LLM Pipeline Monolith vs. Chain Trade-off: When Task Decomposition Helps and When It Hurts

Model Deprecation Readiness: Auditing Your Behavioral Dependency Before the 90-Day Countdown

Model Routing in Production: When the Router Costs More Than It Saves

How to Pick the Right LLM Before You Write a Single Prompt

Preference Data on a Budget: Capturing RLHF Signal Without a Research Team

Prompt Injection at Scale: Defending Agentic Pipelines Against Hostile Content

Prompt Regression Tests That Actually Block PRs

Retrieval Debt: Why Your RAG Pipeline Degrades Silently Over Time

Sampling Parameters in Production: The Tuning Decisions Nobody Explains

Structured Outputs Are Not a Solved Problem: JSON Mode Failure Modes in Production

About Tian Pan