Skip to main content

27 posts tagged with "evaluation"

View all tags

Hard-Won Lessons from Shipping LLM Systems to Production

· 7 min read
Tian Pan
Software Engineer

Most engineers building with LLMs share a common arc: a working demo in two days, production chaos six weeks later. The technology behaves differently under real load, with real users, against real data. The lessons that emerge aren't philosophical—they're operational.

After watching teams across companies ship (and sometimes abandon) LLM-powered products, a handful of patterns appear again and again. These aren't edge cases. They're the default experience.

Building LLM Applications for Production: What Actually Breaks

· 9 min read
Tian Pan
Software Engineer

Most LLM demos work. Most LLM applications in production don't—at least not reliably. The gap between a compelling prototype and something that survives real user traffic is wider than any other software category I've worked with, and the failures are rarely where you expect them.

This is a guide to the parts that break: cost, consistency, composition, and evaluation. Not theory—the concrete problems that cause teams to quietly shelve projects three months after their first successful demo.

Common Pitfalls When Building Generative AI Applications

· 10 min read
Tian Pan
Software Engineer

Most generative AI projects fail — not because the models are bad, but because teams make the same predictable mistakes at every layer of the stack. A 2025 industry analysis found that 42% of companies abandoned most of their AI initiatives, and 95% of generative AI pilots yielded no measurable business impact. These aren't model failures. They're engineering and product failures that teams could have avoided.

This post catalogs the pitfalls that kill AI projects most reliably — from problem selection through evaluation — with specific examples from production systems.