Skip to main content

678 posts tagged with "ai-engineering"

View all tags

The Three Clocks Problem: Why Your AI System Is Living in Three Different Timelines

· 9 min read
Tian Pan
Software Engineer

Your AI system is confidently answering questions about a world that no longer exists. Not because the model is broken, not because retrieval failed, but because three independent clocks are ticking at different rates inside every production AI application — and nobody synchronized them.

This is the three clocks problem: wall clock, model clock, and data clock each operate on their own timeline. When they diverge, you get a system that's technically functioning but substantively wrong in ways that no error log will ever catch.

The AI Delegation Paradox: You Can't Evaluate Work You Can't Do Yourself

· 9 min read
Tian Pan
Software Engineer

Every engineer who has delegated a module to a contractor knows the feeling: the code comes back, the tests pass, the demo works — and you have no idea whether it's actually good. You didn't write it, you don't fully understand the decisions embedded in it, and the review you're about to do is more performance than practice. Now multiply that dynamic by every AI-assisted commit in your codebase.

The AI delegation paradox is simple to state and hard to escape: the skill you need most to evaluate AI-generated work is the same skill that atrophies fastest when you stop doing the work yourself. This isn't a future risk. It's happening now, measurably, across engineering organizations that have embraced AI coding tools.

AI Feature Decay: The Slow Rot That Metrics Don't Catch

· 9 min read
Tian Pan
Software Engineer

Your AI feature launched to applause. Three months later, users are quietly routing around it. Your dashboards still show green — latency is fine, error rates are flat, uptime is perfect. But satisfaction scores are sliding, support tickets mention "the AI is being weird," and the feature that once handled 70% of inquiries now barely manages 50%.

This is AI feature decay: the gradual degradation of an AI-powered feature not from model changes or code bugs, but from the world shifting underneath it. Unlike traditional software that fails with stack traces, AI features degrade silently. The system runs, the model responds, and the output is delivered — it's just no longer what users need.

The AI Skills Inversion: When Junior Engineers Outperform Seniors on the Wrong Metrics

· 8 min read
Tian Pan
Software Engineer

A junior engineer on your team just shipped three features in a week. Your senior engineer shipped half of one. The dashboards say the junior is 6x more productive. The dashboards are lying.

This is the AI skills inversion — a measurement illusion where AI coding assistants make junior engineers look dramatically more productive on surface metrics while masking a deeper problem. The features ship faster, but the architecture degrades. The PRs multiply, but the system coherence erodes. And organizations that trust their dashboards over their judgment are promoting the wrong behaviors and losing the wrong people.

The AI Team Topology Problem: Why Your Org Chart Determines Whether AI Ships

· 8 min read
Tian Pan
Software Engineer

Most AI features die in the gap between "works in notebook" and "works in production." Not because the model is bad, but because the team that built the model and the team that owns the product have never sat in the same room. The AI team topology problem — where AI engineers sit in your org chart — is quietly the biggest predictor of whether your AI investments ship or stall.

The numbers bear this out. Only about half of ML projects make it from prototype to production — at less mature organizations, the failure rate reaches 90%. Meanwhile, CircleCI's 2026 State of Software Delivery report found that AI-assisted code generation boosted feature branch throughput by 59%, yet production branch output actually declined 7% for median teams. Code is being written faster than ever. It's just not shipping.

CLAUDE.md as Codebase API: The Most Leveraged Documentation You'll Ever Write

· 9 min read
Tian Pan
Software Engineer

Most teams treat their CLAUDE.md the way they treat their README: write it once, forget it exists, wonder why nothing works. But a CLAUDE.md isn't documentation. It's an API contract between your codebase and every AI agent that touches it. Get it right, and every AI-assisted commit follows your architecture. Get it wrong — or worse, let it rot — and you're actively making your agent dumber with every session.

The AGENTbench study tested 138 real-world coding tasks across 12 repositories and found that auto-generated context files actually decreased agent success rates compared to having no context file at all. Three months of accumulated instructions, half describing a codebase that had moved on, don't guide an agent. They mislead it.

The Death of the Glue Engineer: AI Is Absorbing the Work That Holds Systems Together

· 11 min read
Tian Pan
Software Engineer

Every engineering organization has them. They don't own a product. They don't ship features users see. But without them, nothing works. They're the engineers who write the ETL pipeline that moves data from the billing system to the analytics warehouse. The ones who build the webhook handler that keeps Salesforce in sync with the internal CRM. The ones who maintain the API adapter layer that lets the mobile app talk to three different backend services that were never designed to talk to each other.

They are the glue engineers, and their work is the first category of software engineering being fully absorbed by AI agents.

The AI-Legible Codebase: Why Your Code's Machine Readability Now Matters

· 8 min read
Tian Pan
Software Engineer

Every engineering team has a version of this story: the AI coding agent that produces flawless code in a greenfield project but stumbles through your production codebase like a tourist without a map. The agent isn't broken. Your codebase is illegible — not to humans, but to machines.

For decades, "readability" meant one thing: could a human developer scan this file and understand the intent? We optimized for that reader with conventions around naming, file size, documentation, and abstraction depth. But the fastest-growing consumer of your codebase is no longer a junior engineer onboarding in their first week. It's an LLM-powered agent that reads, reasons about, and modifies your code thousands of times a day.

Codebase structure is the single largest lever on AI-assisted development velocity — bigger than model choice, bigger than prompt engineering, bigger than which IDE plugin you use. Teams with well-structured codebases report 60–70% fewer iteration cycles when working with AI assistants. The question is no longer whether to optimize for machine readability, but how.

The Agentic Deadlock: When AI Agents Wait for Each Other Forever

· 9 min read
Tian Pan
Software Engineer

Here is an uncomfortable fact about multi-agent AI systems: when you let two or more LLM-powered agents share resources and make decisions concurrently, they deadlock at rates between 25% and 95%. Not occasionally. Not under edge-case load. Under normal operating conditions with standard prompting, the moment agents must coordinate simultaneously, the system seizes up.

This is not a theoretical concern. Coordination breakdowns account for roughly 37% of multi-agent system failures in production, and systems without formal orchestration experience failure rates between 41% and 87%. The classic distributed systems failure modes — deadlock, livelock, priority inversion — are back, and they are wearing new clothes.

AI Feature Billing Is an Engineering Problem Nobody Planned For

· 9 min read
Tian Pan
Software Engineer

Microsoft's Copilot launched with a clean story: 30/user/month,productivitymultiplied.Theactualmathwasuglier.Onceyoufactoredinthebaseenterpriselicense,computecostsperactiveuser,andsupportoverhead,Microsoftwaslosingover30/user/month, productivity multiplied. The actual math was uglier. Once you factored in the base enterprise license, compute costs per active user, and support overhead, Microsoft was losing over 20 per user per month on the feature. Finance didn't catch this immediately because the costs lived in the infrastructure budget, not the product P&L. Engineering knew the token bills were large. Nobody had connected the two lines.

This is the billing problem that most AI teams build into their products without realizing it. Not the pricing strategy problem — that's a product decision. The engineering problem: you have no infrastructure to measure what AI features actually cost per customer, per feature, and per request at the granularity required to make any pricing model work.

AI Product Metrics Nobody Uses: Beyond Accuracy to User Value Signals

· 9 min read
Tian Pan
Software Engineer

A contact center AI system achieved 90%+ accuracy on its validation benchmark. Supervisors still instructed agents to type notes manually. The product was killed 18 months later for "low adoption." This pattern plays out repeatedly across enterprise AI deployments — technically excellent systems that nobody uses, measured by metrics that couldn't see the failure coming.

The problem is a systematic mismatch between what teams measure and what predicts product success. Engineering organizations inherit their measurement instincts from classical ML: accuracy, precision/recall, BLEU scores, latency percentiles, eval pass rates. These describe model behavior in isolation. They tell you almost nothing about whether your AI is actually useful.

AI Technical Debt: Four Categories That Never Show Up in Your Sprint Retro

· 11 min read
Tian Pan
Software Engineer

Your sprint retro covers the usual suspects: flaky tests, that migration someone keeps punting, the API endpoint held together with duct tape. But if you're shipping AI features, the most expensive debt in your codebase is the kind nobody puts on a sticky note.

Traditional technical debt accumulates linearly. You cut a corner, you pay interest on it later, you refactor when the pain gets bad enough. AI technical debt compounds. A prompt that degrades silently produces training signals that pollute your evals, which misguide your next round of prompt changes, which further erodes the quality your users experience. By the time someone notices, three layers of assumptions have rotted underneath you.