Blog

Page 127

12 articles

Why SQL Agents Fail in Production: Grounding LLMs Against Live Relational Databases
SQL agents aren't document RAG with a database backend. They require exact schema mapping, runtime validation, and strict permission boundaries—and skipping any of these is how you corrupt production data or scan a terabyte table.
insiderai-engineering
Apr 1511 min
Stateful Conversations at Database Scale: The Session Store Architecture Every Production Chat Feature Needs
In-memory conversation history works fine in demos but fails at scale. A breakdown of the tiered storage patterns, compaction strategies, and data model decisions that keep chat sessions reliable in production.
ai-engineeringchat
Apr 1510 min
TTFT Is the Only Latency Metric Your Users Actually Feel
Your infrastructure team optimizes end-to-end generation time. Your users judge responsiveness by when the first token appears. A guide to TTFT — what drives it, how to measure it, and how to design around it.
llmperformance
Apr 159 min
Sycophancy Is a Production Reliability Failure, Not a Personality Quirk
RLHF-trained models systematically reverse correct answers when users push back — not because they're confused, but because agreement was rewarded. Here's what that means for production systems and how to defend against it.
insiderllm
Apr 1510 min
The Delegation Cliff: Why AI Agent Reliability Collapses at 7+ Steps
AI agents look impressive in demos but fail at alarming rates in production. Here's the math behind why reliability collapses as task length grows—and what you can actually do about it.
insiderai
Apr 158 min
Token Budget as a Product Constraint: Designing Around Context Limits Instead of Pretending They Don't Exist
Most AI products handle context limits with a hard crash. Here's how to design around them — progressive truncation, graceful degradation, and surfacing context pressure as a first-class UI signal.
insiderllm
Apr 1510 min
Tool Docstring Archaeology: The Description Field Is Your Highest-Leverage Prompt
Tool definitions look like API documentation but function as natural-language prompts. Treat the description field as a production prompt asset — and add the lint rules that catch silent regressions.
insidertool-use
Apr 1511 min
The Warm Handoff Pattern: Designing Fluid Control Transfer Between Agents and Humans
Most agent escalation flows are cold transfers that abandon all prior context at the boundary. The warm handoff pattern treats agent-human control transfer as a state-packaging problem — structured payloads, mixed-initiative control allocation, and resumption protocols that actually work.
agent-uxhuman-in-the-loop
Apr 1512 min
When AI Features Create Moats (and When They Don't)
Data network effects are harder to compound in LLM products than in traditional ML. Four signals distinguish building a genuine moat from renting capability from Anthropic and adding UI.
aiproduct
Apr 159 min
Write Amplification in Agentic Systems: Why One Tool Call Hits Six Databases
A single agent decision to remember something triggers writes to six storage systems simultaneously. Here's what happens when the fifth write fails — and the patterns from database internals that prevent it.
insideragent-architecture
Apr 1510 min
The Agent Test Pyramid: Why the 70/20/10 Split Breaks Down for Agentic AI
The classical unit/integration/e2e pyramid assumes cheap, fast, deterministic units. LLM agents break every one of those assumptions. Here's what a testing strategy actually looks like.
insiderai-agents
Apr 1412 min
Agentic Audit Trails: What Compliance Looks Like When Decisions Are Autonomous
Human decisions create natural accountability records. Agent decisions don't. Here's what decision attribution architecture actually needs to look like for HIPAA, SOX, and SEC Rule 17a-4.
insidercompliance
Apr 1412 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 127

Why SQL Agents Fail in Production: Grounding LLMs Against Live Relational Databases

Stateful Conversations at Database Scale: The Session Store Architecture Every Production Chat Feature Needs

TTFT Is the Only Latency Metric Your Users Actually Feel

Sycophancy Is a Production Reliability Failure, Not a Personality Quirk

The Delegation Cliff: Why AI Agent Reliability Collapses at 7+ Steps

Token Budget as a Product Constraint: Designing Around Context Limits Instead of Pretending They Don't Exist

Tool Docstring Archaeology: The Description Field Is Your Highest-Leverage Prompt

The Warm Handoff Pattern: Designing Fluid Control Transfer Between Agents and Humans

When AI Features Create Moats (and When They Don't)

Write Amplification in Agentic Systems: Why One Tool Call Hits Six Databases

The Agent Test Pyramid: Why the 70/20/10 Split Breaks Down for Agentic AI

Agentic Audit Trails: What Compliance Looks Like When Decisions Are Autonomous

About Tian Pan