Skip to main content

83 posts tagged with "llm"

View all tags

A Year of Building with LLMs: What the Field Has Actually Learned

· 9 min read
Tian Pan
Software Engineer

Most teams building with LLMs today are repeating mistakes that others made a year ago. The most expensive one is mistaking the model for the product.

After a year of LLM-powered systems shipping into production — codegen tools, document processors, customer-facing assistants, internal knowledge systems — practitioners have accumulated a body of hard-won knowledge that's very different from what the hype cycle suggests. The lessons aren't about which foundation model to choose or whether RAG beats finetuning. They're about the unglamorous work of building reliable systems: how to evaluate output, how to structure workflows, when to invest in infrastructure versus when to keep iterating on prompts, and how to think about differentiation.

This is a synthesis of what that field experience actually shows.

How AI Agents Actually Work: Architecture, Planning, and Failure Modes

· 10 min read
Tian Pan
Software Engineer

Most agent failures are architecture failures. The model gets blamed when a task goes sideways, but nine times out of ten, the real problem is that nobody thought hard enough about how planning, tool use, and reflection should fit together. You can swap in a better model and still get the same crashes — because the scaffolding around the model was never designed to handle what the model was being asked to do.

This post is a practical guide to how agents actually work under the hood: what the core components are, where plans go wrong, how reflection loops help (and when they hurt), and what multi-agent systems look like when you're building them for production rather than demos.

Hard-Won Lessons from Shipping LLM Systems to Production

· 7 min read
Tian Pan
Software Engineer

Most engineers building with LLMs share a common arc: a working demo in two days, production chaos six weeks later. The technology behaves differently under real load, with real users, against real data. The lessons that emerge aren't philosophical—they're operational.

After watching teams across companies ship (and sometimes abandon) LLM-powered products, a handful of patterns appear again and again. These aren't edge cases. They're the default experience.

Building LLM Applications for Production: What Actually Breaks

· 9 min read
Tian Pan
Software Engineer

Most LLM demos work. Most LLM applications in production don't—at least not reliably. The gap between a compelling prototype and something that survives real user traffic is wider than any other software category I've worked with, and the failures are rarely where you expect them.

This is a guide to the parts that break: cost, consistency, composition, and evaluation. Not theory—the concrete problems that cause teams to quietly shelve projects three months after their first successful demo.

LLM-Powered Autonomous Agents: The Architecture Behind Real Autonomy

· 8 min read
Tian Pan
Software Engineer

Most teams that claim to have "agents in production" don't. Surveys consistently show that around 57% of engineering organizations have deployed AI agents — but when you apply rigorous criteria (the LLM must plan, act, observe feedback, and adapt based on results), only 16% of enterprise deployments and 27% of startup deployments qualify as true agents. The rest are glorified chatbots with tool calls bolted on.

This gap isn't about model capability. It's about architecture. Genuine autonomous agents require three interlocking subsystems working in concert: planning, memory, and tool use. Most implementations get one right, partially implement a second, and ignore the third. The result is a system that works beautifully in demos and fails unpredictably in production.

Seven Patterns for Building LLM Systems That Actually Work in Production

· 10 min read
Tian Pan
Software Engineer

The demo always works. Prompt the model with a curated example, get a clean output, ship the screenshot to the stakeholder deck. Six weeks later, the system is in front of real users, and none of the demo examples appear in production traffic.

This is the gap every LLM product team eventually crosses: the jump from "it works on my inputs" to "it works on inputs I didn't anticipate." The patterns that close that gap aren't about model selection or prompt cleverness — they're about system design. Seven patterns account for most of what separates functional prototypes from reliable production systems.

Common Pitfalls When Building Generative AI Applications

· 10 min read
Tian Pan
Software Engineer

Most generative AI projects fail — not because the models are bad, but because teams make the same predictable mistakes at every layer of the stack. A 2025 industry analysis found that 42% of companies abandoned most of their AI initiatives, and 95% of generative AI pilots yielded no measurable business impact. These aren't model failures. They're engineering and product failures that teams could have avoided.

This post catalogs the pitfalls that kill AI projects most reliably — from problem selection through evaluation — with specific examples from production systems.

The Agent Evaluation Readiness Checklist

· 9 min read
Tian Pan
Software Engineer

Most teams building AI agents make the same mistake: they start with the evaluation infrastructure before they understand what failure looks like. They instrument dashboards, choose metrics, wire up graders — and then discover their evals are measuring the wrong things entirely. Six weeks in, they have a green scorecard and a broken agent.

The fix is not more tooling. It is a specific sequence of steps that grounds your evaluation in reality before you automate anything. Here is that sequence.

The Anatomy of an Agent Harness

· 9 min read
Tian Pan
Software Engineer

Most engineers building AI agents spend 80% of their time thinking about which model to use and 20% thinking about everything else. That ratio should be flipped. The model is almost interchangeable at this point — the harness is what determines whether your agent actually works in production.

The equation is simple: Agent = Model + Harness. If you're not the model, you're the harness. And the harness is where nearly all the real engineering lives.

Routines and Handoffs: The Two Primitives Behind Every Reliable Multi-Agent System

· 8 min read
Tian Pan
Software Engineer

Most multi-agent systems fail not because the models are wrong, but because the plumbing is leaky. Agents drop context mid-task, hand off to the wrong specialist, or loop indefinitely when they don't know how to exit. The underlying cause is almost always the same: the system was designed around what each agent can do, without clearly defining how work moves between them.

Two primitives fix most of this: routines and handoffs. They're deceptively simple, but getting them right is the difference between a demo that works and a system you can ship.

Building Effective AI Agents: Patterns That Actually Work in Production

· 9 min read
Tian Pan
Software Engineer

Most AI agent projects fail not because the models aren't capable enough — but because the engineers building them reach for complexity before they've earned it. After studying dozens of production deployments, a clear pattern emerges: the teams shipping reliable agents start with the simplest possible system and add complexity only when metrics demand it.

This is a guide to the mental models, patterns, and practical techniques that separate robust agentic systems from ones that hallucinate, loop, and fall apart under real workloads.