Blog

Page 48

12 articles

The Ghost Employee in Your Audit Log: Agents With Borrowed Credentials Break IAM
Borrowed credentials make agents look like the humans who launched them in every audit log — and that thin film is why a prompt injection in 2026 becomes an unattributable breach.
ai-agentsiam
Apr 2610 min
GPU Capacity Planning When Demand Is a Cliff, Not a Curve
Agent workloads break smooth-curve capacity planning. Plan in tokens, treat fanout as a first-class metric, and reserve for cliffs you know will land.
insidergpu-capacity
Apr 2610 min
The Hollow Explanation Problem: When Your Model's Reasoning Is Decoration, Not Evidence
Generated explanations next to LLM outputs often have no causal link to the actual computation. Why post-hoc rationalization erodes user trust faster than admitted uncertainty, and the design patterns that don't fake explainability.
llmai-engineering
Apr 2611 min
Inter-Token Jitter: The Streaming UX Failure Your p95 Dashboards Can't See
Time-to-first-token and total completion time both pass SLO while users complain the AI 'froze' mid-response. The metric your dashboard hides is the gap between consecutive tokens — and smoothing it is a UX problem, not a throughput problem.
ai-engineeringstreaming
Apr 2611 min
The Internal LLM Gateway Is the New Service Mesh
At fifty engineers, every team rebuilds the same LLM gateway badly. Why the pattern keeps emerging, what to centralize vs leave at the edge, and how the political fight gets settled.
insiderllm-gateway
Apr 2610 min
The Inverted Agent: When the User Is the Planner and the Model Is the Step-Executor
Most agent products put the model in charge of planning and the user in charge of approving. For high-stakes work, that polarity is exactly backwards — and the fix is a different product, not a better prompt.
agentsux
Apr 2612 min
JSON Mode Is a Dialect, Not a Standard: The Silent Breakage in Your Fallback Path
Every major LLM provider ships JSON mode under the same name and a different contract. The day your fallback router activates is the day you find out which differences your parser couldn't survive.
llmstructured-outputs
Apr 2611 min
The Eval Pickle: When Your LLM Judge Gets Smarter Than the Model It Grades
When the LLM grading your evals gets sharper, your scores drop on a system that didn't change. Here's how to tell judge drift from model regression — and stop debugging the wrong instrument.
evalsllm-as-judge
Apr 269 min
The Knowledge Cutoff Is a UX Surface, Not a Footnote
Every LLM has a knowledge cutoff and every product silently lies about it. Treat freshness as a designed UX surface — not a footnote — or users will calibrate trust against an answer the model should have refused.
insiderai-ux
Apr 2612 min
Knowledge Graph Staleness Has a Different SLA Than Vector Staleness
Vector indexes degrade gracefully but knowledge graphs fail discontinuously — running them behind one CDC pipeline ships silently wrong answers on multi-hop queries.
ragknowledge-graphs
Apr 2610 min
Your LLM Judge Has a Length Bias, a Position Bias, and a Format Bias — and Nobody Is Auditing Yours
LLM-as-judge has length, position, and format biases that silently turn prompt iteration into a Goodhart machine. Three audits and a versioned judge fix it.
insiderllm
Apr 2611 min
Your SRE Postmortem Template Is Missing Six Fields That Decide Every LLM Incident
The SRE postmortem template was built for code changes and infrastructure faults. For LLM incidents, the variables that actually moved are missing — prompt revision, model selection slice, judge config, retrieval index state, tool schemas, traffic mix. Here is the template fields and incident-class taxonomy that close the gap.
insiderai-engineering
Apr 2611 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 48

The Ghost Employee in Your Audit Log: Agents With Borrowed Credentials Break IAM

GPU Capacity Planning When Demand Is a Cliff, Not a Curve

The Hollow Explanation Problem: When Your Model's Reasoning Is Decoration, Not Evidence

Inter-Token Jitter: The Streaming UX Failure Your p95 Dashboards Can't See

The Internal LLM Gateway Is the New Service Mesh

The Inverted Agent: When the User Is the Planner and the Model Is the Step-Executor

JSON Mode Is a Dialect, Not a Standard: The Silent Breakage in Your Fallback Path

The Eval Pickle: When Your LLM Judge Gets Smarter Than the Model It Grades

The Knowledge Cutoff Is a UX Surface, Not a Footnote

Knowledge Graph Staleness Has a Different SLA Than Vector Staleness

Your LLM Judge Has a Length Bias, a Position Bias, and a Format Bias — and Nobody Is Auditing Yours

Your SRE Postmortem Template Is Missing Six Fields That Decide Every LLM Incident

About Tian Pan