AI Made Coding 30% Faster, But Delivery Only 8% Better. Are We Measuring the Wrong Thing?

I’ve been tracking something troubling across our engineering org, and new research from Thoughtworks just validated what I’ve been seeing.

The headline: AI coding assistants made our developers roughly 30% faster at writing code. That’s huge, right?

The reality: Our net delivery improvement? About 8%.

Let me break down what’s happening. When we rolled out AI coding tools last year, individual velocity metrics skyrocketed. Developers were cranking out features faster than ever. PRs were flying. The team felt productive. Leadership was celebrating the AI investment.

But when we looked at actual feature delivery—time from conception to production—we barely moved the needle.

Why the Gap?

Coding is only about half of our total cycle time. The other half? Testing, code reviews, waiting for environments, managing dependencies between teams, dealing with deployment pipelines.

AI helps developers write code faster, but it doesn’t:

  • Speed up security reviews in our fintech environment
  • Resolve architectural decisions faster
  • Reduce waiting time for QA environments
  • Eliminate cross-team dependencies
  • Make our CI/CD pipeline any faster
  • Reduce the cognitive load of reviewing larger, AI-generated PRs

We optimized one part of the system and celebrated it. But the system didn’t get faster.

The Leadership Blind Spot

Here’s what worries me: we’re measuring—and rewarding—the wrong thing.

We track “lines of code written” and “PRs submitted” because those numbers look great in board decks. But customers don’t care about code velocity. They care about features in production.

The Thoughtworks research calls this out explicitly: the 8% delivery improvement is what actually matters to business outcomes, forecasting accuracy, and customer impact. Not the 30% coding gain.

The Real Question

Should we shift our focus from developer productivity to delivery system productivity?

What if instead of celebrating faster coding, we invested in:

  • Automating code reviews and testing pipelines
  • Reducing handoffs and wait times
  • Simplifying deployment processes
  • Resolving architectural bottlenecks
  • Improving cross-team coordination

I’m curious what other engineering leaders are seeing. Are you measuring delivery outcomes or just coding throughput? What bottlenecks are you tackling?

And honestly—are we solving the right problem with AI tools, or just making one part of a broken system incrementally better?


Sources: Thoughtworks AI Productivity Research, Software Development Bottlenecks 2026

This resonates deeply with what we’re seeing in our fintech environment, Keisha.

Here’s a concrete example: Last quarter, one of my teams built a new payment reconciliation feature. With AI assistance, the coding took 2 days. The feature sat in various review and approval stages for 3 weeks before it hit production.

The breakdown:

  • Code complete: 2 days
  • Security review: 5 days (waiting for security team availability)
  • Compliance review: 4 days (legal had questions about audit trails)
  • Integration testing: 3 days (waiting for test environment access)
  • Cross-team dependency resolution: 4 days (another team’s API wasn’t ready)
  • Final deployment approval: 3 days (release window scheduling)

AI made the coding faster, but it didn’t touch any of the gates that actually control our delivery speed.

Your point about measuring the wrong thing really hits home. We celebrate “story points completed” in sprint reviews, but our actual delivery cadence—the thing customers experience—hasn’t changed.

I’ve started pushing my team to track “time to production” instead of “time to code complete.” The data is humbling. It’s forcing us to have honest conversations about what’s actually blocking delivery.

Question for the group: Has anyone successfully automated or streamlined the non-coding parts of the delivery pipeline? What worked?

From a product perspective, this gap between coding velocity and delivery velocity is incredibly frustrating.

I can’t tell you how many times I’ve had this conversation:

Me to engineering: “When will feature X ship?”
Engineering: “The code is done!”
Me: “Great, so it’s in production?”
Engineering: “Well, no… it needs testing, review, staging deployment, then we wait for the release window…”
Me: “So when will it actually ship?”
Engineering: “Uh… maybe next sprint?”

Customers don’t care that the code is “done.” They care about features they can actually use.

The AI productivity story we’re telling ourselves—“developers are 30% faster!”—is solving the wrong problem. It’s like optimizing ingredient prep time in a restaurant while ignoring that there’s only one oven and the health inspector needs to approve every dish before it leaves the kitchen.

Here’s what I want to know: Are AI tools even solving a real bottleneck, or are we just making the easiest part of the process incrementally better?

From where I sit, the actual constraints are:

  • Architectural decisions that require senior input (limited bandwidth)
  • Cross-functional dependencies that require coordination (scheduling overhead)
  • Quality gates that require domain expertise (compliance, security, UX review)
  • Deployment processes that require operational approval (risk management)

None of these are code generation problems.

@vp_eng_keisha I’d love to hear how you’re aligning engineering metrics with product delivery metrics. We need shared KPIs that reflect actual customer value, not just development activity.

Coming from the design side, I want to add another perspective: maybe we shouldn’t even want 30% faster coding if it means skipping important quality gates.

Here’s what I’ve noticed when developers use AI to code faster:

More rework, not less
Developers ship features quickly, then design review catches fundamental UX problems. Now we’re doing the work twice—once fast with AI, once correctly after feedback. Net velocity? Negative.

Less collaborative design
When coding is fast and easy, there’s less incentive to involve design early. Developers prototype directly in code instead of collaborating on design specs. We lose the opportunity to catch problems before implementation.

Accessibility gets skipped
AI-generated code rarely includes proper ARIA labels, keyboard navigation, or screen reader support. Fast coding means more accessibility debt that we have to fix later (or worse, ship to users with disabilities).

My controversial take: The 8% delivery improvement might actually be the right number if we care about quality.

The constraints you’re all describing—design review, user research, security audits, compliance checks—exist for a reason. They catch problems that AI-generated code misses.

If we “optimize” these gates away in pursuit of faster delivery, we’ll ship more features that don’t meet user needs, violate compliance requirements, or create security vulnerabilities.

@eng_director_luis Your payment reconciliation example is perfect. Those 3 weeks of reviews probably saved you from shipping something with audit trail gaps that would’ve cost way more to fix in production.

The real question isn’t “how do we get to 30% delivery gains?” It’s “which quality gates are actually valuable, and which are bureaucratic overhead?”

Some gates are essential. Some are organizational scar tissue from past mistakes. AI coding speed forces us to distinguish between the two.