Skip to main content

458 posts tagged with "ai-engineering"

View all tags

The AI Team Topology Problem: Why Your Org Chart Determines Whether AI Ships

· 8 min read
Tian Pan
Software Engineer

Most AI features die in the gap between "works in notebook" and "works in production." Not because the model is bad, but because the team that built the model and the team that owns the product have never sat in the same room. The AI team topology problem — where AI engineers sit in your org chart — is quietly the biggest predictor of whether your AI investments ship or stall.

The numbers bear this out. Only about half of ML projects make it from prototype to production — at less mature organizations, the failure rate reaches 90%. Meanwhile, CircleCI's 2026 State of Software Delivery report found that AI-assisted code generation boosted feature branch throughput by 59%, yet production branch output actually declined 7% for median teams. Code is being written faster than ever. It's just not shipping.

CLAUDE.md as Codebase API: The Most Leveraged Documentation You'll Ever Write

· 9 min read
Tian Pan
Software Engineer

Most teams treat their CLAUDE.md the way they treat their README: write it once, forget it exists, wonder why nothing works. But a CLAUDE.md isn't documentation. It's an API contract between your codebase and every AI agent that touches it. Get it right, and every AI-assisted commit follows your architecture. Get it wrong — or worse, let it rot — and you're actively making your agent dumber with every session.

The AGENTbench study tested 138 real-world coding tasks across 12 repositories and found that auto-generated context files actually decreased agent success rates compared to having no context file at all. Three months of accumulated instructions, half describing a codebase that had moved on, don't guide an agent. They mislead it.

The Death of the Glue Engineer: AI Is Absorbing the Work That Holds Systems Together

· 11 min read
Tian Pan
Software Engineer

Every engineering organization has them. They don't own a product. They don't ship features users see. But without them, nothing works. They're the engineers who write the ETL pipeline that moves data from the billing system to the analytics warehouse. The ones who build the webhook handler that keeps Salesforce in sync with the internal CRM. The ones who maintain the API adapter layer that lets the mobile app talk to three different backend services that were never designed to talk to each other.

They are the glue engineers, and their work is the first category of software engineering being fully absorbed by AI agents.

The AI-Legible Codebase: Why Your Code's Machine Readability Now Matters

· 8 min read
Tian Pan
Software Engineer

Every engineering team has a version of this story: the AI coding agent that produces flawless code in a greenfield project but stumbles through your production codebase like a tourist without a map. The agent isn't broken. Your codebase is illegible — not to humans, but to machines.

For decades, "readability" meant one thing: could a human developer scan this file and understand the intent? We optimized for that reader with conventions around naming, file size, documentation, and abstraction depth. But the fastest-growing consumer of your codebase is no longer a junior engineer onboarding in their first week. It's an LLM-powered agent that reads, reasons about, and modifies your code thousands of times a day.

Codebase structure is the single largest lever on AI-assisted development velocity — bigger than model choice, bigger than prompt engineering, bigger than which IDE plugin you use. Teams with well-structured codebases report 60–70% fewer iteration cycles when working with AI assistants. The question is no longer whether to optimize for machine readability, but how.

The Agentic Deadlock: When AI Agents Wait for Each Other Forever

· 9 min read
Tian Pan
Software Engineer

Here is an uncomfortable fact about multi-agent AI systems: when you let two or more LLM-powered agents share resources and make decisions concurrently, they deadlock at rates between 25% and 95%. Not occasionally. Not under edge-case load. Under normal operating conditions with standard prompting, the moment agents must coordinate simultaneously, the system seizes up.

This is not a theoretical concern. Coordination breakdowns account for roughly 37% of multi-agent system failures in production, and systems without formal orchestration experience failure rates between 41% and 87%. The classic distributed systems failure modes — deadlock, livelock, priority inversion — are back, and they are wearing new clothes.

AI Feature Billing Is an Engineering Problem Nobody Planned For

· 9 min read
Tian Pan
Software Engineer

Microsoft's Copilot launched with a clean story: 30/user/month,productivitymultiplied.Theactualmathwasuglier.Onceyoufactoredinthebaseenterpriselicense,computecostsperactiveuser,andsupportoverhead,Microsoftwaslosingover30/user/month, productivity multiplied. The actual math was uglier. Once you factored in the base enterprise license, compute costs per active user, and support overhead, Microsoft was losing over 20 per user per month on the feature. Finance didn't catch this immediately because the costs lived in the infrastructure budget, not the product P&L. Engineering knew the token bills were large. Nobody had connected the two lines.

This is the billing problem that most AI teams build into their products without realizing it. Not the pricing strategy problem — that's a product decision. The engineering problem: you have no infrastructure to measure what AI features actually cost per customer, per feature, and per request at the granularity required to make any pricing model work.

AI Product Metrics Nobody Uses: Beyond Accuracy to User Value Signals

· 9 min read
Tian Pan
Software Engineer

A contact center AI system achieved 90%+ accuracy on its validation benchmark. Supervisors still instructed agents to type notes manually. The product was killed 18 months later for "low adoption." This pattern plays out repeatedly across enterprise AI deployments — technically excellent systems that nobody uses, measured by metrics that couldn't see the failure coming.

The problem is a systematic mismatch between what teams measure and what predicts product success. Engineering organizations inherit their measurement instincts from classical ML: accuracy, precision/recall, BLEU scores, latency percentiles, eval pass rates. These describe model behavior in isolation. They tell you almost nothing about whether your AI is actually useful.

AI Technical Debt: Four Categories That Never Show Up in Your Sprint Retro

· 11 min read
Tian Pan
Software Engineer

Your sprint retro covers the usual suspects: flaky tests, that migration someone keeps punting, the API endpoint held together with duct tape. But if you're shipping AI features, the most expensive debt in your codebase is the kind nobody puts on a sticky note.

Traditional technical debt accumulates linearly. You cut a corner, you pay interest on it later, you refactor when the pain gets bad enough. AI technical debt compounds. A prompt that degrades silently produces training signals that pollute your evals, which misguide your next round of prompt changes, which further erodes the quality your users experience. By the time someone notices, three layers of assumptions have rotted underneath you.

Building Multilingual AI Products: The Quality Cliff Nobody Measures

· 11 min read
Tian Pan
Software Engineer

Your AI product scores 82% on your eval suite. You ship to 40 countries. Three months later, French and German users report quality similar to English. Hindi and Arabic users quietly stop using the feature. Your aggregate satisfaction score barely budges — because English-speaking users dominate the metric pool. The cliff was always there. You just weren't measuring it.

This is the default story for most teams shipping multilingual AI products. The quality gap isn't subtle. A state-of-the-art model like QwQ-32B drops from 70.7% on English reasoning benchmarks to 32.8% on Swahili — a 54% relative performance collapse on the best available model tested in 2025. And that's the best model. This gap doesn't disappear as models get larger. It shrinks for high-resource languages and stays wide for everyone else.

Capability Elicitation: Getting Models to Use What They Already Know

· 8 min read
Tian Pan
Software Engineer

Most teams debugging a bad LLM output reach for the same fix: rewrite the prompt. Add more instructions. Clarify the format. Maybe throw in a few examples. This is prompt engineering in its most familiar form — making instructions clearer so the model understands what you want.

But there's a different failure mode that better instructions can't fix. Sometimes the model has the knowledge and can perform the reasoning, but your prompt doesn't activate it. The model isn't confused about your instructions — it's failing to retrieve and apply capabilities it demonstrably possesses.

This is the domain of capability elicitation. Understanding the difference between "the model can't do this" and "my prompt doesn't trigger it" will change how you debug every AI system you build.

Capability Elicitation vs. Prompt Engineering: Your Model Already Knows the Answer

· 9 min read
Tian Pan
Software Engineer

Most prompt engineering advice focuses on the wrong problem. Teams spend weeks refining instruction clarity — adding examples, adjusting tone, restructuring formats — when the actual bottleneck is that the model fails to activate knowledge it demonstrably possesses. The distinction matters: prompt engineering tells a model what to do, while capability elicitation gets a model to use what it already knows.

This isn't a semantic quibble. The UK's AI Safety Institute found that proper elicitation techniques can improve model performance by an amount equivalent to increasing training compute by five to twenty times. That's not a marginal gain from better wording. That's an entire capability tier sitting dormant inside models you're already paying for.

The Centralized AI Platform Trap: Why Shared ML Teams Kill Product Velocity

· 8 min read
Tian Pan
Software Engineer

Most engineering organizations discover the problem the same way: AI demos go well, leadership pushes for broader adoption, and someone decides the right answer is a dedicated team to own "AI infrastructure." The team gets headcount, a roadmap, and a mandate to accelerate AI across the organization.

Eighteen months later, product teams are filing tickets to get their prompts deployed. The platform team is overwhelmed. Features that took days to demo are taking quarters to ship. And the team originally created to speed up AI adoption has become its primary bottleneck.

This is the centralized AI platform trap — and it's surprisingly easy to fall into.