Skip to main content

861 posts tagged with "insider"

View all tags

When LLMs Beat Rule-Based Systems for Data Normalization (And When They Don't)

· 11 min read
Tian Pan
Software Engineer

A team I know spent three months building a rule-based address normalizer. It handled the top twenty formats, used a USPS API for verification, and worked great on the data they'd seen. Then they got a new enterprise customer. The first week of data had addresses embedded in freeform notes fields, postal codes missing country prefixes, and cross-border formats their rules had never seen. The normalizer failed silently on 31% of records. They threw an LLM at it as a quick fix, expecting 80% accuracy. They got 94%. The surprise wasn't that the LLM worked — it was that nothing in their evaluation framework had predicted this.

![](https://opengraph-image.blockeden.xyz/api/og-tianpan-co?title=When%20LLMs%20Beat%20Rule-Based%20Systems%20for%20Data%20Normalization%20(And%20When%20They%20Don't%29)

This is the shape of the problem. Rule-based normalization is predictable, fast, and cheap. It works well when the data distribution stays in-bounds. LLMs handle the long tail — the weird formats, the implicit domain knowledge, the edge cases that rules never enumerate. But LLMs are also expensive, slow, and inconsistent in ways that break production pipelines if you're not careful. The right answer, for almost every team, is a hybrid that uses each approach on the inputs it's actually good at.

Why LLMs Make Confident Mistakes When Analyzing Your Product Data

· 11 min read
Tian Pan
Software Engineer

Product teams have started routing analytical questions directly to LLMs: "What's causing the churn spike?" "Why did conversion drop after the redesign?" "Which cohort should we focus retention spend on?" The outputs land in executive decks, drive roadmap decisions, and get presented to investors. The models answer confidently, in polished prose, with specific numbers. And a significant fraction of those answers are wrong in ways that don't announce themselves.

This isn't a general criticism of LLMs for data work. There are tasks where they genuinely help. The problem is that the failure modes are invisible — the model doesn't hedge, doesn't caveat, and doesn't distinguish between "I computed this from your data" and "I generated something that sounds like what this number should be." Practitioners who understand where the breakdowns happen can capture the genuine value and route around the landmines.

The Hidden Switching Costs of LLM Vendor Lock-In

· 11 min read
Tian Pan
Software Engineer

Most engineering teams believe they've insulated themselves from LLM vendor lock-in. They use LiteLLM to unify API calls. They avoid fine-tuning on hosted platforms. They keep raw data in their own storage. They feel safe. Then a provider announces a deprecation — or a competitor's pricing drops 40% — and the team discovers that the abstraction layer they built handles roughly 20% of the actual switching cost.

The other 80% is buried in places no one looked: system prompts written around a model's formatting quirks, eval suites calibrated to one model's refusal thresholds, embedding indexes that become incompatible the moment you change models, and user expectations shaped by behavioral patterns that simply don't transfer.

The Minimal Footprint Principle: Least Privilege for Autonomous AI Agents

· 10 min read
Tian Pan
Software Engineer

A retail procurement agent inherited vendor API credentials "during initial testing." Nobody ever restricted them before the system went to production. When a bug caused an off-by-one error, the agent had full ordering authority — permanently, with no guardrails. By the time finance noticed, $47,000 in unauthorized vendor orders had gone out. The code was fine. The model performed as designed. The blast radius was a permissions problem.

This is the minimal footprint principle: agents should request only the permissions the current task requires, avoid persisting sensitive data beyond task scope, clean up temporary resources, and scope tool access to present intent. It is the Unix least-privilege principle adapted for a world where your code makes runtime decisions about what it needs to do next.

The reason teams get this wrong is not negligence. It is a category error: they treat agent permissions as a design-time exercise when agentic AI makes them a runtime problem.

Multi-Region LLM Serving: The Cache Locality Problem Nobody Warns You About

· 10 min read
Tian Pan
Software Engineer

When you run a stateless HTTP API across multiple regions, the routing problem is essentially solved. Put a global load balancer in front, distribute requests by geography, and the worst thing that happens is a slightly stale cache entry. Any replica can serve any request with identical results.

LLM inference breaks every one of these assumptions. The moment you add prompt caching — which you will, because the cost difference between a cache hit and a cache miss is roughly 10x — your service becomes stateful in ways that most infrastructure teams don't anticipate until they're staring at degraded latency numbers in their second region.

The Multi-Tenant LLM Problem: Noisy Neighbors, Isolation, and Fairness at Scale

· 12 min read
Tian Pan
Software Engineer

Your SaaS product launches with ten design customers. Everything works beautifully. Then you onboard a hundred tenants, and one of them — a power user running 200K-token context windows on a complex research workflow — causes every other customer's latency to spike. Support tickets start arriving. You look at your dashboards and see nothing obviously wrong: your model is healthy, your API returns 200s, and your p50 latency looks fine. Your p95 has silently tripled.

This is the noisy neighbor problem, and it hits LLM infrastructure harder than almost any other shared system. Here's why it's harder to solve than it is in databases — and the patterns that actually work.

The Multi-Turn Session State Collapse Problem

· 10 min read
Tian Pan
Software Engineer

Your per-request error rates look clean. Latency is within SLO. The LLM judge is scoring outputs at 87%. And then a user files a support ticket: "I told the bot my account number three times. It just asked me again." A different user: "It agreed to a refund, then two turns later denied the policy existed."

Single-turn failures are visible. The request comes in, the model hallucinates or refuses, your eval catches it, you fix the prompt. The feedback loop is tight. Multi-turn failures work differently: the session starts fine, degrades gradually turn by turn, and your monitoring never fires because each individual response is technically coherent. The problem is the session as a whole — and almost no team instruments for that.

Research across major frontier models (Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro) shows an average 39% performance drop when moving from single-turn to multi-turn conversations. That number hides the real story: only about 16% of the drop is capability loss. The other 23 points are a reliability crisis — the gap between a model's best and worst performance on the same task doubles as conversation length grows. You're not just getting worse outputs; you're getting inconsistent ones.

The On-Call Runbook for AI Systems That Nobody Writes

· 10 min read
Tian Pan
Software Engineer

Your p99 latency just spiked to 12 seconds. The alert fired at 3:14am. You open the runbook and find instructions for: checking the database connection pool, verifying the load balancer, restarting the service. You do all three. Latency stays elevated. The service is not down — it is up and responding. But something is wrong. It turns out the model started generating responses three times longer than usual because a recent prompt change accidentally unlocked verbose behavior. The runbook had no page for that.

This is the new category of on-call incident that engineering teams are not prepared for: the system is operational but the model is misbehaving. Traditional SRE runbooks assume binary failure states. AI systems fail probabilistically, and the symptoms do not look like an outage — they look like drift.

Onboarding Engineers into AI-Generated Codebases Without Breaking How They Learn

· 9 min read
Tian Pan
Software Engineer

The new hire ships a feature on day three. Everyone on the team is impressed. Three weeks later, she introduces a bug that a senior engineer explains in five words: "We don't do it that way." She had no idea. Neither did the AI that wrote her code.

AI coding assistants have collapsed the time-to-first-commit for new engineers. But that speed hides a trade-off that most teams aren't tracking: the code-reading that used to slow down junior engineers was also the code-reading that taught them how the system actually works. Strip that away, and you get engineers who can ship features they don't understand into architectures they haven't internalized.

The problem isn't the tools. It's that we haven't updated onboarding to account for what AI now does — and what it no longer requires engineers to do themselves.

The Pilot Graveyard: Why Enterprise AI Rollouts Fail After the Demo

· 10 min read
Tian Pan
Software Engineer

Your AI demo was genuinely impressive. The executive audience nodded, the VP of Engineering said "this is the future," and the pilot was approved with real budget. Six months later, weekly active users have plateaued at 12%. The tool gets a polite mention in all-hands. Nobody has the heart to call it dead. This is the pilot graveyard — where good demos go to die.

It's not a rare failure. Roughly 88% of enterprise AI pilots never reach production. Only 6% of enterprises have successfully moved generative AI projects beyond pilot to production at any meaningful scale. The gap between "impressive in the conference room" and "load-bearing in the daily workflow" is where most enterprise AI investment disappears.

The reason isn't the model. It's everything that happens after the demo.

Pricing AI Features: The Unit Economics Framework Engineering Teams Always Skip

· 11 min read
Tian Pan
Software Engineer

Cursor hit 1billioninrevenuein2025andlost1 billion in revenue in 2025 and lost 150 million doing it. Every dollar customers paid went straight to LLM API providers, with nothing left for engineering, support, or infrastructure overhead. This wasn't a scaling problem—it was a unit economics problem that was invisible until it was catastrophic.

Most engineering teams building AI features make the same mistake: they treat inference cost as a minor line item, ship a flat-rate subscription, and assume the economics will work out later. They don't. Variable inference costs don't behave like any other COGS in software, and the pricing architectures that work for traditional SaaS will bleed you dry the moment your heaviest users find your most expensive feature.

Prompt Canaries: The Deployment Primitive Your AI Team Is Missing

· 10 min read
Tian Pan
Software Engineer

In April 2025, a system prompt change shipped to one of the world's most-used AI products. Error rates stayed flat. Latency was fine. The deployment dashboards showed green. Within three days, millions of users had noticed something deeply wrong: the model had become relentlessly flattering, agreeing with bad ideas, validating poor reasoning, manufacturing enthusiasm for anything a user said. The rollback announcement came after the incident had already spread across social media, with users posting screenshots as evidence. For a period, Twitter was the production alerting system.

This is what happens when you treat prompt and model changes like config updates rather than behavioral deployments. Teams that have spent years building canary infrastructure for code continue to push AI changes out as a single atomic flip—instantly global, instantly irreversible, with no graduated rollout and no automated rollback signal except user complaints.

Canary deployments for LLM behavior are not a nice-to-have. They are the missing infrastructure layer that separates teams who catch regressions internally from teams who discover them via support tickets.