Skip to main content

24 posts tagged with "engineering-leadership"

View all tags

The Rubber-Stamp Collapse: Why AI-Authored PRs Are Hollowing Out Code Review

· 10 min read
Tian Pan
Software Engineer

A senior engineer approves a 400-line PR in four minutes. The diff is clean. Names are sensible. Tests pass. Two weeks later the on-call engineer is paging through a query that returns the right shape of rows but from the wrong column — user.updated_at where user.created_at was meant — and the cohort analysis dashboard has been quietly lying to the CFO for nine days. The reviewer was competent. The code was well-structured. The bug was invisible in the diff because it wasn't a syntactic smell. It was a semantic one, and the reviewer had nothing to anchor against because no one had written down what the change was supposed to do.

This is the failure mode that shows up once the majority of diffs in your repo start life as model output. Reviewers stop asking "is this correct?" and start asking "does this look like code?" The answer is almost always yes. AI-authored code is grammatically fluent in a way that bypasses the review heuristics engineers spent a decade sharpening on human-written slop.

Communicating AI Limitations Across the Organization: A Framework for Engineering Leaders

· 11 min read
Tian Pan
Software Engineer

The demo worked perfectly. Legal had signed off. Sales was already promising customers the feature would ship next quarter. Then the first production failure happened — the model confidently drafted a clause that cited a contract term that didn't exist, sales forwarded it to a customer, and legal spent three weeks in damage control.

This is not a story about a bad model. It's a story about miscommunication. The engineering team knew the model could hallucinate. Legal assumed it wouldn't. Sales assumed any failure would be caught before reaching customers. Ops assumed someone else was monitoring for exactly this. Nobody was lying. Everyone was working from a different mental model of the same system.

The root cause of most AI project failures isn't the AI. According to RAND Corporation's analysis of failed AI initiatives, "misunderstood problem definition" — which includes miscommunication about capability limits — is the single most common cause. Between 70 and 95% of enterprise AI initiatives fail to deliver their intended outcomes, and the technology is rarely the limiting factor. The limiting factor is that every team in your organization is quietly building a different theory of what your AI system does, and nobody has explicitly corrected any of them.

Board-Level AI Governance: The Five Decisions Only Executives Can Make

· 9 min read
Tian Pan
Software Engineer

A major insurer's AI system was denying coverage claims. When humans reviewed those decisions, 90% were found to be wrong. The insurer's engineering team had built a performant model. Their MLOps team had solid deployment pipelines. Their data scientists had rigorous evaluation metrics. None of that mattered, because no one at the board level had ever answered the question: what is our acceptable failure rate for AI decisions that affect whether a sick person gets treated?

That gap — between functional technical systems and missing executive decisions — is where AI governance most often breaks down in practice. The result is organizations that are simultaneously running AI in production and exposed to liability they've never formally acknowledged.

The Mental Model Shift That Separates Good AI Engineers from the Rest

· 10 min read
Tian Pan
Software Engineer

The most common pattern among engineers who struggle with AI work isn't a lack of technical knowledge. It's that they keep asking the wrong question. They want to know: "Does this work?" What they should be asking is: "At what rate does this fail, and is that rate acceptable for this use case?"

That single shift — from binary correctness to acceptable failure rates — is the core of what experienced AI engineers think differently about. It sounds simple. It isn't. Everything downstream of it is different: how you debug, how you test, how you deploy, what you monitor, what you build your confidence on. Engineers who haven't made this shift will keep fighting their tools and losing.

The Shared Prompt Service Problem: Multi-Team LLM Platforms and the Dependency Nightmare

· 10 min read
Tian Pan
Software Engineer

On a Tuesday afternoon, the platform team at a mid-size AI startup merged a "minor improvement" to the shared system prompt. By Thursday, three separate product teams had filed bugs. One team's evaluation suite dropped from 87% to 61% accuracy. Another team's RAG pipeline started producing hallucinated citations. A third team's safety filter stopped catching a category of harmful outputs entirely. Nobody connected the dots for four days.

This is the shared prompt service problem, and it's coming for every organization that has more than one team building on a common LLM platform.

Why '92% Accurate' Is Almost Always a Lie

· 8 min read
Tian Pan
Software Engineer

You launch an AI feature. The model gets 92% accuracy on your holdout set. You present this to the VP of Product, the legal team, and the head of customer success. Everyone nods. The feature ships.

Three months later, a customer segment you didn't specifically test is experiencing a 40% error rate. Legal is asking questions. Customer success is fielding escalations. The VP of Product wants to know why no one flagged this.

The 92% figure was technically correct. It was also nearly useless as a decision-making input — because headline accuracy collapses exactly the information that matters most.

When Everyone Has an AI Coding Agent: The Team Dynamics Nobody Warned You About

· 10 min read
Tian Pan
Software Engineer

A team of twelve engineers adopts AI coding tools enthusiastically. Six months later, each engineer is merging nearly twice as many pull requests. The engineering manager celebrates. Then the on-call rotation starts paging. Debugging sessions last twice as long. Nobody can explain why a particular module was structured the way it was. The engineer who wrote it replies honestly: "I don't know — the AI generated most of it and it seemed fine."

This scenario is playing out at companies everywhere. The individual productivity story is real: developers finish tasks faster, write more tests, and clear backlogs more efficiently. The team-level story is more complicated, and most organizations aren't ready for it.

Onboarding Engineers into AI-Generated Codebases Without Breaking How They Learn

· 9 min read
Tian Pan
Software Engineer

The new hire ships a feature on day three. Everyone on the team is impressed. Three weeks later, she introduces a bug that a senior engineer explains in five words: "We don't do it that way." She had no idea. Neither did the AI that wrote her code.

AI coding assistants have collapsed the time-to-first-commit for new engineers. But that speed hides a trade-off that most teams aren't tracking: the code-reading that used to slow down junior engineers was also the code-reading that taught them how the system actually works. Strip that away, and you get engineers who can ship features they don't understand into architectures they haven't internalized.

The problem isn't the tools. It's that we haven't updated onboarding to account for what AI now does — and what it no longer requires engineers to do themselves.

AI Feature Decommissioning Forensics: What Dead Features Teach That Successful Ones Cannot

· 11 min read
Tian Pan
Software Engineer

Here's an uncomfortable pattern: the AI feature your team is about to launch next quarter already died at your company two years ago. It shipped under a different name, with a different prompt, solving a vaguely different problem, and it got quietly decommissioned after six months of flat adoption. Nobody wrote it up. Nobody connected the dots. The leading indicators that would have saved this cycle were sitting in dashboards that got archived along with the feature.

Most engineering orgs are elaborate machines for remembering successes. Launches get retrospectives, blog posts, internal celebrations. The features that got killed — the ones with 12% weekly active users despite a polished demo, the ones whose unit economics inverted when token costs compounded across a longer-than-expected tool chain, the ones users learned to trust, lost trust in, and then routed around — generate almost no institutional memory. And the failure patterns embedded in those deaths are exactly the ones your planning process has no way to price in.

The Cognitive Offloading Trap: When Your Team Can't Work Without the AI

· 9 min read
Tian Pan
Software Engineer

Three months after rolling out an AI coding assistant to their entire engineering team, a company noticed something disturbing: their code review pass rate had dropped 18%, their sprint velocity was up, but the number of production incidents had climbed. When they asked developers to explain a recent AI-generated module during a post-mortem, nobody in the room could. Not even the person who merged it.

This is the cognitive offloading trap. And it's not a failure of AI tools — it's a failure of how teams integrate them.

Hiring for LLM Engineering: What the Interview Actually Needs to Test

· 10 min read
Tian Pan
Software Engineer

Most engineering teams that hire for LLM roles run roughly the same interview: two rounds of LeetCode, a system design question, maybe a quiz on transformer internals. They're assessing for the wrong things — and they know it. The candidates who ace those screens often struggle to ship working AI features, while the ones who stumble on binary search can build an eval suite from scratch and debug a hallucinating pipeline in an afternoon.

The skills that predict success in LLM engineering have almost no overlap with what traditional ML or software interviews test. Hiring managers who haven't updated their process are generating false negatives at a high rate — rejecting engineers who would succeed — while false positives walk in with solid LeetCode scores and no intuition for when a model is confidently wrong.

Staffing AI Engineering Teams: Who Owns What When Every Feature Has an AI Component

· 11 min read
Tian Pan
Software Engineer

Three years ago, "AI team" meant a group of specialists tucked into a corner of the org chart, mostly invisible to product engineers. Today, a senior software engineer at a fintech company ships a fraud-scoring feature using a fine-tuned model on Monday, wires up a RAG pipeline for customer support on Wednesday, and debugs LLM latency on Friday. The specialists didn't go away—but the boundary between "AI work" and "product engineering" dissolved faster than almost anyone planned for.

Most teams responded by bolting new titles onto existing job descriptions and calling it done. That's the wrong answer, and the dysfunction shows up quickly: unclear ownership, duplicated tooling, and an ML platform team that spends half its time explaining why product teams can't just call the OpenAI API directly.

This post is about getting the structure right—not in the abstract, but for the actual stages of AI adoption most engineering organizations go through.