Blog

Page 124

12 articles

Hiring for LLM Engineering: What the Interview Actually Needs to Test
Standard coding screens and ML math questions fail to predict LLM engineering success. Here's what practical interview exercises actually reveal about a candidate's ability to ship AI products.
llmhiring
Apr 1510 min
Hot-Path vs. Cold-Path AI: The Architectural Decision That Decides Your p99
A decision framework for which AI work belongs in the request path, which belongs in a queue, and how to migrate across the boundary once traffic shape changes.
ai-architecturelatency
Apr 1510 min
The Implicit API Contract: What Your LLM Provider Doesn't Document
LLM providers guarantee uptime and latency SLAs. They don't guarantee that your prompts will produce the same output next month. Here's what engineers need to know about the implicit behavioral contract — and how to test against it.
llmproduction
Apr 1510 min
The Intent Classification Layer Most Agent Routers Skip
Most agent routers load every tool schema on every request and let the LLM decide. At 417 tools, that approach collapses to 20% accuracy. Here's how an intent classification layer fixes it—and why skipping it quietly destroys both accuracy and cost at scale.
insiderai-engineering
Apr 1511 min
Judge Model Independence: Why Your Eval Breaks When the Grader Shares Blind Spots with the Graded
Using the same model family as both product and judge inflates scores by 8–16% because they share blind spots. Here's how to build evaluation systems that actually catch what your model misses.
insiderevaluation
Apr 159 min
Keeping Synthetic Eval Data Honest
Using LLMs to generate your own test cases creates a flattering but misleading feedback loop. Here's how adversarial seeding, human annotation triage, and diversity gap analysis fix the structural blindspots synthetic evals miss.
ai-engineeringevaluation
Apr 159 min
Knowledge Graphs as a RAG Alternative: When Structured Retrieval Beats Embeddings
Vector similarity search fails silently on multi-hop queries and schema-dependent facts. Here's when a property graph with traversal queries outperforms embedding lookup — and how to build the hybrid that covers both.
ragknowledge-graphs
Apr 159 min
LLM Confidence Calibration in Production: Measuring and Fixing the Overconfidence Problem
LLMs that say 'I'm highly confident' are often wrong at that exact rate. How to measure calibration error, why RLHF makes it worse, and the production design patterns that actually help.
llmproduction
Apr 1510 min
The Provider Abstraction Tax: Building LLM Applications That Can Swap Models Without Rewrites
Teams that build directly on one LLM provider accumulate prompt idioms, tool schema conventions, and behavioral dependencies that become migration debt. Here's the abstraction layer design that makes switching providers a configuration change rather than a multi-month rewrite.
llmengineering
Apr 1510 min
LLMs in the Security Operations Center: Acceleration Without Liability
How to wire LLMs into security operations so they accelerate triage without quietly approving real intrusions — confidence thresholds, log-poisoning defenses, and the metrics that matter.
securityllm
Apr 1511 min
The max_tokens Knob Nobody Tunes: Output Truncation as a Cost Lever
Most teams pad max_tokens to avoid mid-generation cutoffs and pay for the slack forever. Per-route calibration against real output distributions can cut output token spend 20–40% without quality loss.
insiderllm
Apr 1511 min
Your AI Feature Should Lose to a Regex First
Before you invest in fine-tuning or RAG, your AI feature should be required to beat the simplest deterministic baseline you can build. Most teams skip this gate and pay for it.
ai-engineeringllm
Apr 159 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 124

Hiring for LLM Engineering: What the Interview Actually Needs to Test

Hot-Path vs. Cold-Path AI: The Architectural Decision That Decides Your p99

The Implicit API Contract: What Your LLM Provider Doesn't Document

The Intent Classification Layer Most Agent Routers Skip

Judge Model Independence: Why Your Eval Breaks When the Grader Shares Blind Spots with the Graded

Keeping Synthetic Eval Data Honest

Knowledge Graphs as a RAG Alternative: When Structured Retrieval Beats Embeddings

LLM Confidence Calibration in Production: Measuring and Fixing the Overconfidence Problem

The Provider Abstraction Tax: Building LLM Applications That Can Swap Models Without Rewrites

LLMs in the Security Operations Center: Acceleration Without Liability

The max_tokens Knob Nobody Tunes: Output Truncation as a Cost Lever

Your AI Feature Should Lose to a Regex First

About Tian Pan