Blog

Page 153

12 articles

The Plausible Completion Trap: Why Code Agents Produce Convincingly Wrong Code
Code agents produce code that compiles, lints, and looks right but silently does the wrong thing. Here's why the training objective guarantees this, what the data shows, and how to build verification loops that actually catch it.
ai-engineeringcode-agents
Apr 1110 min
Prompt Injection Surface Area Mapping: Find Every Attack Vector Before Attackers Do
A practitioner's methodology for enumerating every external data source that reaches your LLM prompt, risk-scoring each injection surface, and applying the right sanitization pattern without breaking model reasoning.
insidersecurity
Apr 1111 min
Property-Based Testing for LLM Systems: Invariants That Hold Even When Outputs Don't
Eval datasets tell you whether your LLM passes a fixed set of examples. Property-based testing tells you whether it obeys a contract across the entire input space. Here's how to apply it to non-deterministic systems.
llmtesting
Apr 1112 min
Provider Lock-In Anatomy: The Seven Coupling Points That Make Switching LLM Providers a 6-Month Project
Seven hidden coupling points — from prompt syntax and tool calling schemas to embedding spaces and billing models — explain why switching LLM providers takes months, not days. A practical audit framework for managing lock-in deliberately.
insiderllm-ops
Apr 1110 min
Race Conditions in Concurrent Agent Systems: The Bugs That Look Like Hallucinations
Parallel sub-agents silently corrupt shared state in ways that look exactly like model hallucination. Here's how read-modify-write races work in production agent systems, which distributed systems primitives fix them, and the instrumentation that tells a concurrency bug from a genuine model failure.
insidermulti-agent
Apr 1113 min
Coalesce Before You Call: The LLM Request Batching Pattern That Cuts Costs Without Slowing Users Down
Request coalescing is a layered architecture—in-flight deduplication, exact caching, and semantic batching—that cuts LLM inference costs 40–60% without degrading user experience. Here's how to implement it and where it breaks down.
llmcost-optimization
Apr 1111 min
Schema-Driven Prompt Design: Letting Your Data Model Drive Your Prompt Structure
The shape of your entity schema directly determines LLM output reliability. Learn how normalization, nesting depth, field ordering, and enum constraints affect hallucination rates — and the refactoring patterns that make prompt-to-output mapping predictable.
insiderllm
Apr 1110 min
Simulation Environments for Agent Testing: Building Sandboxes Where Consequences Are Free
Staging environments that 'look like production' mislead more than they inform. Here's how to build simulation environments where agents can take real actions against fake infrastructure — and why the highest-ROI approach is simulating only the tools that can't be undone.
ai-agentstesting
Apr 1110 min
SLOs for Non-Deterministic Systems: Defining Reliability When Every Response Is Different
Traditional SLIs like latency and error rate miss the dominant failure mode of AI systems — correct execution, wrong answer. A practical framework for semantic SLOs, error budgets at 85% baselines, and alerting architectures that distinguish real degradation from normal variance.
reliabilitysre
Apr 118 min
Speculative Decoding in Practice: The Free Lunch That Isn't Quite Free
How speculative decoding cuts LLM inference latency 2-3x by drafting tokens with a small model and verifying in parallel — plus the draft model selection math, batch size tradeoffs, and production pitfalls that determine whether you get a speedup or a slowdown.
insiderinference-optimization
Apr 1110 min
Stateful vs. Stateless AI Features: The Architectural Decision That Shapes Everything Downstream
The choice between stateful and stateless AI features is made early and felt everywhere — in your storage layer, your debugging toolchain, your security posture, and your costs. Here's how to make it deliberately.
insiderai-architecture
Apr 1112 min
Structured Outputs and Constrained Decoding: Eliminating Parsing Failures in Production LLMs
Constrained decoding guarantees schema-valid LLM output at the token level, removing retry logic and parsing heuristics from production pipelines — but research shows a 17% creativity cost that demands a clear decision framework.
insiderstructured-outputs
Apr 119 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 153

The Plausible Completion Trap: Why Code Agents Produce Convincingly Wrong Code

Prompt Injection Surface Area Mapping: Find Every Attack Vector Before Attackers Do

Property-Based Testing for LLM Systems: Invariants That Hold Even When Outputs Don't

Provider Lock-In Anatomy: The Seven Coupling Points That Make Switching LLM Providers a 6-Month Project

Race Conditions in Concurrent Agent Systems: The Bugs That Look Like Hallucinations

Coalesce Before You Call: The LLM Request Batching Pattern That Cuts Costs Without Slowing Users Down

Schema-Driven Prompt Design: Letting Your Data Model Drive Your Prompt Structure

Simulation Environments for Agent Testing: Building Sandboxes Where Consequences Are Free

SLOs for Non-Deterministic Systems: Defining Reliability When Every Response Is Different

Speculative Decoding in Practice: The Free Lunch That Isn't Quite Free

Stateful vs. Stateless AI Features: The Architectural Decision That Shapes Everything Downstream

Structured Outputs and Constrained Decoding: Eliminating Parsing Failures in Production LLMs

About Tian Pan