Blog

Page 61

12 articles

Semantic Cache Is a Safety Problem, Not a Perf Win
Semantic caches can serve another user's response in under a millisecond and your hit-rate dashboard will turn green doing it. The cache-key design, provenance envelope, and audit trail that prevent cross-user leak by construction.
insiderllm
Apr 2212 min
Semantic Diff for Prompts: Why Git Diff Lies About What Your Prompt Change Will Do
Text-level diffs have almost no correlation with how an LLM's behavior changes. A three-word edit can flip 30% of outputs while a fifty-line restructure changes nothing. Here is how to build a semantic diff toolkit that PR reviewers can actually trust.
prompt-engineeringevals
Apr 2210 min
The Ship-and-Pin Trap: How Model Version Stability Becomes Deprecation Debt
Pinning a model version buys short-term stability and quietly accrues deprecation debt. Scheduled re-qualification, drift monitoring against the next tier, and a dual-track prompt portfolio turn migrations into routine operations instead of fire drills.
llm-opsai-engineering
Apr 229 min
Spec-First Agents: Why the Contract Has to Land Before the Prompt
Prompt-as-spec collapses under more than one author. A spec-first contract — inputs, outputs, invariants, errors, refusals, escalations — turns prompt edits into diffs, makes evals derivable, and shrinks owner onboarding from months to a week.
insideragents
Apr 2211 min
The Synthetic Preference Trap: How AI-Ranked RLHF Quietly Drifts Your Model Into the Teacher's Voice
Synthetic preference data feels like a free lunch — until your product quietly starts sounding exactly like the teacher model you trained it from. A field guide to spotting, measuring, and bounding RLHF flavor drift.
rlhffine-tuning
Apr 2212 min
Token Spend Is a Security Signal Your SOC Isn't Watching
Anomalous LLM token spend is the earliest signal of a compromised API key, prompt injection, or data exfiltration — but billing owns the dashboard and security owns the response. Here is how to wire them together.
insidersecurity
Apr 2211 min
Your Tool Descriptions Are Prompts, Not API Docs
Tool spec text is the prompt the model reads before deciding when to invoke. Treat it like a prompt — concrete use cases, negative examples, sibling disambiguation — not like OpenAPI docs.
ai-engineeringtool-use
Apr 2210 min
Tool Hallucination Rate: The Probe Suite Your Agent Team Isn't Running
Most agent teams measure tool-call success but never measure tool hallucination. Split the rate into three — unknown-tool, shadow-call, hallucinated-argument — and build the probe suite that catches each before production does.
insiderai-agents
Apr 229 min
Tool Manifest Lies: When Your Agent Trusts a Schema Your Backend No Longer Honors
The most dangerous bug in a production agent isn't the one that throws — it's the one where the tool description promises a field the backend renamed two sprints ago, and the model keeps reasoning as if nothing changed.
ai-agentstool-use
Apr 2210 min
Tool Outputs Are an Untrusted Channel Your Agent Treats as Trusted
Tool outputs share a token stream with the system prompt, so every read-tool is a prompt-injection surface. Here is the trust-boundary model, the four production patterns, and the eval harness that actually measures whether your defenses hold.
ai-securityllm-agents
Apr 2211 min
Tool Schema Deprecation: Why You Can't Just Rename a Parameter
Agent tool schemas live in two places at once — the runtime spec and the model's in-context memory. Renaming a parameter breaks both in different ways. Here is the deprecation playbook.
mcpagents
Apr 2211 min
Time-to-First-Token Is the Latency SLO You Aren't Instrumenting
p50 and p99 total latency miss the single number that governs how your AI product feels: time to first token. Here is why reasoning models make it worse, what to measure, and how to route around it.
llm-opsobservability
Apr 2211 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 61

Semantic Cache Is a Safety Problem, Not a Perf Win

Semantic Diff for Prompts: Why Git Diff Lies About What Your Prompt Change Will Do

The Ship-and-Pin Trap: How Model Version Stability Becomes Deprecation Debt

Spec-First Agents: Why the Contract Has to Land Before the Prompt

The Synthetic Preference Trap: How AI-Ranked RLHF Quietly Drifts Your Model Into the Teacher's Voice

Token Spend Is a Security Signal Your SOC Isn't Watching

Your Tool Descriptions Are Prompts, Not API Docs

Tool Hallucination Rate: The Probe Suite Your Agent Team Isn't Running

Tool Manifest Lies: When Your Agent Trusts a Schema Your Backend No Longer Honors

Tool Outputs Are an Untrusted Channel Your Agent Treats as Trusted

Tool Schema Deprecation: Why You Can't Just Rename a Parameter

Time-to-First-Token Is the Latency SLO You Aren't Instrumenting

About Tian Pan