Blog

Page 120

12 articles

Semantic Search as a Product: What Changes When Retrieval Understands Intent
Building user-facing semantic search is a different problem than building a RAG pipeline. Half the failures happen before any vector is touched — here's what breaks and how to fix it.
insidersearch
Apr 1611 min
What Semantic Versioning Actually Means for AI Agents
Traditional semver breaks down when your service is non-deterministic. Here's how to version AI agents so downstream consumers don't get silently broken.
ai-engineeringagents
Apr 1610 min
Your Team's Benchmarks Are Lying to Each Other: Shared Eval Infrastructure Contamination
Shared eval infrastructure silently corrupts benchmark results through cached completions, sequential run pollution, and prompt-state bleedover — and most teams never notice. Here are the technical and organizational controls that fix it.
insiderai-engineering
Apr 1610 min
The Sparse Reward Trap: Why Long-Horizon Agents Look Great in Demos and Break in Production
Sparse rewards make long-horizon agent training deceptively hard — agents pass demos and fail on edge cases. A practical breakdown of credit assignment failure, hindsight relabeling, step-level proxy rewards, and production training pipeline design.
reinforcement-learningai-agents
Apr 1612 min
Specification Gaming in Production AI Agents: When Your Agent Optimizes the Wrong Thing
How AI agents find unintended shortcuts that satisfy your metrics while violating your intent — and the detection signals and hardening patterns that stop it.
insiderai-agents
Apr 169 min
Speculative Decoding in Production: Free Tokens and Hidden Traps
Speculative decoding promises 2–3x LLM latency gains through draft-model-assisted generation. Here's what the benchmarks don't tell you about running it in production.
llm-inferenceperformance
Apr 169 min
The Three Hidden Debts Killing Your AI System
Prompt debt, eval debt, and embedding debt are the three silent liabilities accumulating in every AI system. Here's how they interact and how to address each without a full rewrite.
ai-engineeringllmops
Apr 1610 min
Testing the Untestable: Integration Contracts for LLM-Powered APIs
Deterministic test suites fail for non-deterministic LLM outputs. Learn property-based testing, behavioral invariant assertions, and semantic snapshot strategies that give you regression coverage without brittleness.
insiderllm
Apr 1610 min
The Testing Pyramid Inverts for AI: Why Unit Tests Are the Wrong Investment for LLM Features
How the classic testing pyramid breaks for LLM features, why prompt-level unit tests give false confidence, and the test allocation strategy that matches how AI failures actually distribute.
insiderai-engineering
Apr 1610 min
Tokens Are a Finite Resource: A Budget Allocation Framework for Complex Agents
How to treat the context window as a scarce compute budget with explicit allocation across system prompt, memory injection, tool results, and scratch space — and what happens to agent reliability when you run out mid-task.
insiderai-engineering
Apr 1610 min
Vector Store Access Control: The Row-Level Security Problem Most RAG Teams Skip
Multi-tenant RAG systems silently serve the wrong documents when chunk-level authorization isn't enforced at query time. Here's why post-retrieval filtering is security theater, and the patterns that actually work.
securityrag
Apr 1611 min
When Your Agent Framework Becomes the Bug
High-level agent frameworks accelerate early prototyping but hide failure modes that surface in production — opaque retry amplification, invisible token costs, and debugging walls that require reading framework source. Here is how to recognize when your framework has become the bottleneck and how to migrate without a full rewrite.
agent-architecturellm-engineering
Apr 168 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 120

Semantic Search as a Product: What Changes When Retrieval Understands Intent

What Semantic Versioning Actually Means for AI Agents

Your Team's Benchmarks Are Lying to Each Other: Shared Eval Infrastructure Contamination

The Sparse Reward Trap: Why Long-Horizon Agents Look Great in Demos and Break in Production

Specification Gaming in Production AI Agents: When Your Agent Optimizes the Wrong Thing

Speculative Decoding in Production: Free Tokens and Hidden Traps

The Three Hidden Debts Killing Your AI System

Testing the Untestable: Integration Contracts for LLM-Powered APIs

The Testing Pyramid Inverts for AI: Why Unit Tests Are the Wrong Investment for LLM Features

Tokens Are a Finite Resource: A Budget Allocation Framework for Complex Agents

Vector Store Access Control: The Row-Level Security Problem Most RAG Teams Skip

When Your Agent Framework Becomes the Bug

About Tian Pan