Why Your 'Improved Velocity' Isn't Reaching Customers

I need to have a hard conversation about something I’m seeing across engineering and product teams. Your engineering organization might be celebrating velocity improvements from AI coding tools, but if you’re a PM looking at actual customer-facing delivery, you might be seeing a very different story.

Let me share what happened at my company.

The Disconnect

Three months ago, our VP of Engineering presented impressive metrics to the board:

  • 85% increase in PRs merged
  • 60% increase in commits per engineer
  • “Dramatically improved development velocity” from AI tool adoption

The board was thrilled. Our technical roadmap was apparently accelerating.

Except… as VP of Product, my metrics told a completely different story:

  • Feature delivery timeline: unchanged (still averaging 6-8 weeks from idea to customer)
  • Customer-facing releases: same monthly cadence
  • Product roadmap completion: actually slightly behind plan
  • Customer satisfaction with pace of innovation: declining 8 points

We were generating more code but delivering the same amount of value. Something wasn’t adding up.

Where the Time Goes

I dug into where the supposed velocity gains were disappearing. Here’s what I found:

The Review Bottleneck (we’ve discussed this extensively in other threads):

  • Average time from PR creation to merge: 91% longer
  • Senior engineer capacity consumed by review: massive increase
  • Review iteration cycles: nearly doubled

Increased Bug Fixing:

  • Production incidents: up 23%
  • Time spent on bug fixes vs new features: shifted from 20/80 to 35/65
  • Customer-reported issues: up 31%

Technical Debt Servicing:

  • Refactoring PRs (fixing AI-generated code): up dramatically
  • Code quality issues requiring rework: significant increase
  • Architecture inconsistency fixes: new category of work

Integration and Testing:

  • Features passing code review but failing integration tests
  • More time in staging catching issues
  • Longer QA cycles for AI-assisted features

The Real Cycle Time

When you measure the full cycle - from feature concept to customer value - the AI productivity gains largely evaporate:

Pre-AI Full Cycle (average feature):

  • Design & planning: 1 week
  • Implementation: 2 weeks
  • Code review: 3-4 days
  • QA & testing: 1 week
  • Deployment & monitoring: 2-3 days
  • Total: ~4.5 weeks

Post-AI Full Cycle (average feature):

  • Design & planning: 1 week (unchanged)
  • Implementation: 1 week (faster!)
  • Code review: 7-8 days (slower!)
  • QA & testing: 1.5 weeks (more bugs to catch)
  • Bug fixes & rework: 4-5 days (new category!)
  • Deployment & monitoring: 2-3 days (unchanged)
  • Total: ~4.5 weeks (same!)

We saved a week in implementation but lost it in review, QA, and rework.

The Customer Interview Data

I ran customer interviews specifically about product velocity perception. The feedback was concerning:

“Features are announced but seem to take forever to actually ship.”

“Quality has gotten worse - more bugs, more edge cases not handled.”

“New features feel rushed, like they weren’t fully thought through.”

“We asked for [feature A] but got [feature A-] that doesn’t quite solve our problem.”

This last point was particularly telling. AI helps engineers implement quickly, but speed isn’t the same as solving the right problem well.

The Business Case Is Falling Apart

We invested significantly in AI coding tools:

  • Copilot subscriptions for 80 engineers: ~K/year
  • Training and enablement: ~K
  • Infrastructure for AI-assisted development: ~K

The promised ROI was based on 30-40% productivity improvements. But when you measure productivity as “customer value delivered per dollar spent,” we’re seeing maybe 5-8% improvement at best.

And that’s before accounting for:

  • Senior engineer satisfaction decline (retention risk)
  • Increased technical debt (future cost)
  • Higher production incident rate (customer impact)
  • Longer feature planning cycles (context-building overhead)

The Hard Conversation with Leadership

Last week I had to present this analysis to our CEO and board. It was not the conversation anyone wanted to have after celebrating our “AI transformation.”

I showed them two charts side by side:

Chart 1: Engineering Metrics (PRs, commits, “velocity”)

  • Trending up dramatically

Chart 2: Product Metrics (features shipped, customer satisfaction, business KPIs)

  • Flat or slightly declining

The question I posed: “Which metrics actually matter for our business?”

Where Engineering and Product Need to Align

The core issue: engineering and product are optimizing for different things.

Engineering is (understandably) excited about AI tools making coding faster and is measuring success by coding metrics.

Product is focused on customer value delivery and measuring success by customer outcomes.

These need to be the same thing.

What We’re Changing

We’re realigning around shared metrics:

Old Metrics (engineering-focused):

  • PRs per week
  • Commits per engineer
  • Lines of code

New Metrics (outcome-focused):

  • Customer-facing features shipped per quarter
  • Time from customer request to delivered solution
  • Production incident rate
  • Customer satisfaction with product evolution
  • Technical quality (measured by refactor/rework rate)

Process Changes:

  • Feature planning now includes “full cycle time” estimates, not just implementation time
  • Engineering estimates include review, QA, and likely rework
  • We budget senior engineer review capacity as a constraint in planning
  • AI tool usage is encouraged but not mandated - teams choose based on what actually improves outcomes

The Cultural Challenge

The hardest part is managing the narrative. The AI vendor messaging is everywhere: “2x developer productivity,” “ship faster with AI,” “transform your engineering organization.”

Our engineers believe they’re more productive (and in a narrow sense, they are - writing code is faster). But the business isn’t seeing the productivity gains translate to outcomes.

This creates tension. Engineers feel their wins aren’t being recognized. Product feels engineering doesn’t understand the business impact. Leadership is confused about whether AI tools are working.

What Product Leaders Need from Engineering Leaders

If you’re a PM or product leader working with an engineering org using AI tools heavily, here’s what to ask for:

  1. Full cycle metrics: Not just implementation time, but idea-to-customer delivery time
  2. Quality metrics: Bug rates, rework rates, technical debt accumulation
  3. Capacity modeling: Explicit accounting for review bottlenecks in planning
  4. Shared success criteria: What customer outcomes are we trying to improve?
  5. Honest assessment: Where are AI tools actually helping vs. creating new problems?

The Path Forward

I’m not anti-AI tools. The implementation speedup is real and valuable for certain types of work. But I am deeply concerned about the mismatch between engineering metrics and business outcomes.

We need to be honest about the full system impacts - review bottlenecks, quality issues, senior engineer burden - and design our processes and metrics around what actually matters: delivering value to customers sustainably.

The productivity illusion is dangerous. It makes us feel like we’re moving faster while the customer experience suggests otherwise.

Questions for the Community

For other product leaders:

  • Are you seeing this disconnect between engineering velocity metrics and actual feature delivery?
  • How are you measuring AI tool ROI in product terms?
  • What metrics have you aligned on with engineering?

For engineering leaders:

  • How do you balance AI productivity narratives with product delivery realities?
  • What does “productivity” actually mean when the bottleneck shifts from coding to review?

We need to figure this out together because the current state - celebrating code generation while customers wait the same amount of time for features - isn’t sustainable.

David, thank you for this perspective. As a VP of Engineering, reading this is both uncomfortable and necessary. You’re holding up a mirror that many of us need to look into.

Acknowledging the Misalignment

I’ll be blunt: you’re right. Engineering leaders (myself included) got excited about AI tools and started measuring success by engineering metrics that don’t necessarily translate to business value.

When I present to our board, I show PR velocity and commit frequency because those numbers look good. But if I’m honest, those metrics are proxies for what we hope is happening (faster delivery) rather than measures of what actually matters (customer value).

Your full cycle time analysis is a wake-up call. At my company, I suspect we’d find similar results - implementation faster, but total cycle time unchanged or worse.

The Pressure We’re Under

Part of what’s happening is external pressure. Tech media, AI vendors, investors - everyone is pushing the narrative that AI tools make developers dramatically more productive. As engineering leaders, we feel pressure to adopt these tools and show results.

But “results” got defined narrowly as “more code, faster” rather than “more value, sustainably.” That’s a failure of leadership (mine included) to push back on simplistic metrics.

The Executive Communication Gap

You mentioned the hard conversation with your CEO. I need to have a similar one. Here’s what I think engineering leaders need to communicate better to executives:

  1. Productivity is system-level, not local: Optimizing one part of the system (code generation) can create bottlenecks elsewhere (review, QA, maintenance)

  2. Velocity metrics are misleading: More PRs doesn’t mean more value. It might just mean more review burden and technical debt.

  3. Quality and speed trade-offs are real: AI tools can increase speed, but often at quality costs that show up later in the cycle.

  4. Senior engineer capacity is the actual constraint: Review capacity, architectural judgment, and mentorship are finite resources that don’t scale with AI tools.

Shared Success Metrics - I’m In

Your proposed shift to outcome-focused metrics is exactly right. Here’s what I’m committing to track and share with product:

Joint Engineering-Product Metrics:

  • Customer-facing feature delivery timeline (full cycle)
  • Production quality (incident rate, bug escape rate)
  • Technical health (tech debt trend, system reliability)
  • Team sustainability (engineer satisfaction, senior retention)
  • Customer satisfaction with product evolution

Engineering-Specific Leading Indicators:

  • Review queue depth and cycle time
  • Code ownership clarity (does the author deeply understand their code?)
  • Rework rate (how often do we fix recently-written code?)
  • Architectural drift (is the system becoming harder to reason about?)

These aren’t as sexy as “85% more PRs” but they’re honest about what’s actually happening.

The Cultural Work Ahead

The hard part is changing engineering culture from “ship fast” to “ship value sustainably.” This is especially challenging when:

  • AI tools make shipping fast feel effortless
  • Engineers genuinely feel more productive
  • External messaging celebrates speed above all
  • We’ve hired and rewarded based on output metrics

We need to have honest team conversations about:

  • What productivity actually means (customer value, not code volume)
  • Why quality gates exist (not bureaucracy, but sustainable delivery)
  • How AI tools fit into our workflow (assistive, not replacement)
  • What success looks like (customer outcomes, not PR count)

Proposal: Joint Product-Engineering Working Group

I’d like to propose that product and engineering leaders form explicit working groups to:

  1. Define shared success metrics that reflect business outcomes
  2. Redesign planning processes to account for full cycle time realities
  3. Align on AI tool usage - where it helps vs. where it creates problems
  4. Build feedback loops from customer outcomes back to engineering practices
  5. Create honest narratives about productivity that reflect reality

What I’m Asking Product Leaders

Help engineering leaders by:

  • Pushing back on simplistic productivity narratives
  • Sharing customer feedback about quality and delivery
  • Insisting on full cycle metrics in planning
  • Supporting process changes that slow apparent velocity but improve outcomes
  • Being partners in the hard conversation with executives about what productivity actually means

Commitment to Change

Reading your post, I’m committing to:

  • Stop celebrating PR count as a primary metric
  • Start measuring and reporting full cycle delivery time
  • Include product in defining what engineering success means
  • Be honest about AI tool trade-offs, not just benefits
  • Align engineering goals with customer value delivery

This is uncomfortable because it means some of our celebrated “wins” need to be reframed. But it’s necessary if we’re going to actually deliver value rather than just generate code.

David, this post is exactly the conversation product and engineering need to have more often. I’m experiencing the same tension at my financial services company, and I want to share how we’re navigating it.

Similar Data, Similar Disconnect

Our metrics mirror yours almost exactly:

Engineering celebrates:

  • 87% more PRs merged
  • 58% more commits per engineer
  • “Transformed development velocity”

Product reports:

  • Feature delivery timeline: basically unchanged
  • Quarterly roadmap completion: slightly worse
  • Customer escalations about quality: up 25%

The disconnect is real and it’s creating organizational tension.

The Quarterly Planning Disaster

Last quarter’s planning exposed the problem brutally. Product came with a feature roadmap based on “our new AI-enhanced velocity.” Engineering committed based on implementation estimates that didn’t account for the full cycle.

What happened:

  • Engineering delivered implementations on time
  • Review bottlenecks delayed everything
  • QA found more issues than expected
  • Rework consumed capacity
  • We delivered 60% of the planned roadmap

Product felt engineering over-promised. Engineering felt we’d delivered what we said we would (implementations). Both perspectives were true, and that’s the problem.

Where Engineering Metrics Deceive Us

I’ve been complicit in the metrics mismatch. Here’s how engineering metrics deceive:

Metric: PRs per week

  • What it claims to measure: productivity
  • What it actually measures: code generation rate
  • What it misses: review burden, rework, quality, value

Metric: Commits per engineer

  • What it claims to measure: individual productivity
  • What it actually measures: activity
  • What it misses: whether that activity creates value

Metric: Implementation time

  • What it claims to measure: feature delivery speed
  • What it actually measures: time to first PR
  • What it misses: review, QA, bugs, rework, deployment

These metrics feel scientific and objective, but they’re optimizing for the wrong thing.

Success Story: Unified Metrics Dashboard

We built something that’s helping - a shared Product-Engineering metrics dashboard that shows:

Customer-Facing Metrics (product owns):

  • Features delivered per quarter
  • Customer satisfaction scores
  • Feature adoption rates
  • Support ticket trends

Delivery Health Metrics (jointly owned):

  • Idea-to-customer cycle time
  • Production incident rate
  • Time to fix issues
  • Quality trends

Engineering Health Metrics (engineering owns):

  • Technical debt trajectory
  • System reliability
  • Team satisfaction
  • Review queue health

The key: these metrics are reviewed together in bi-weekly product-engineering syncs. We discuss them as a system, not in isolation.

What Changed

When product asks “why is this feature taking so long?” we can show the full pipeline:

  • Implementation: done week 1
  • Review: 8 days (queue was deep)
  • QA: found 3 issues (AI-generated edge case bugs)
  • Rework: 3 days
  • Re-review: 2 days
  • Deployment: done week 4

This visibility helps product understand the full cycle and helps engineering see where we’re actually slow (hint: not implementation).

Engineering Capacity as a Planning Input

We now explicitly include engineering capacity constraints in roadmap planning:

Implementation Capacity: How much feature work we can code
Review Capacity: How much code we can review thoroughly
Senior Engineering Capacity: Limited resource for architecture and complex review

Product now sees that senior engineer review capacity is the actual constraint. This has led to tough prioritization conversations: “Do we build 3 medium features or 1 complex feature, given review constraints?”

It’s slower planning, but more honest about what we can actually deliver.

The AI Tool Usage Conversation

We’ve also gotten more nuanced about AI tool recommendations:

Good Use Cases:

  • Boilerplate and repetitive code
  • Test generation (with review)
  • Documentation drafting
  • Refactoring with clear scope

Problematic Use Cases:

  • Complex business logic
  • Security-critical code
  • Architectural decisions
  • Novel algorithms

We’re not mandating AI tools anymore. Teams choose based on what helps them deliver value, not based on wanting to show AI adoption.

Alignment on What Matters

The real breakthrough was getting engineering and product leadership aligned on a simple question: “What does success look like for the business?”

Not: “How many PRs can we merge?”
But: “Are we delivering valuable features to customers at a sustainable pace with acceptable quality?”

When you frame it that way, the metrics conversation becomes much clearer.

David, this resonates so hard from a design perspective. We have the exact same pattern - design is “moving faster” with AI tools, but features aren’t reaching customers any faster, and quality concerns are increasing.

The Design Version of This Problem

Our design team adopted AI tools enthusiastically:

  • Midjourney for visual concepts
  • AI-powered Figma plugins for layouts
  • ChatGPT for UX copy
  • Various AI tools for design system work

Our design velocity metrics looked great:

  • Design concepts per sprint: up 70%
  • Figma files created: up 85%
  • Design iterations: way up

But when I look at what actually matters - how quickly we’re delivering well-considered user experiences - the timeline hasn’t changed. If anything, it’s gotten longer.

Where Design AI Creates Hidden Costs

Similar to your engineering analysis, the time savings in creation get consumed in other parts of the process:

Design Review Takes Longer: AI-generated designs look polished but often lack strategic thinking. Reviewing them requires unpacking “why” decisions were made, which is harder when the answer is “the AI suggested it.”

Design-Eng Sync Overhead: Designs that look complete but lack context require more back-and-forth with engineering. The missing context has to be built through conversation.

Revision Cycles: Because AI makes iteration cheap, we iterate more. But more iterations don’t necessarily mean better outcomes - sometimes they mean we’re searching for good rather than thinking through what good is.

User Testing Reveals Misses: AI-generated designs often pass design review but fail in user testing because they’re missing domain-specific context or user empathy.

End-to-End Feature Delivery Hasn’t Improved

Your cycle time analysis mirrors what I’m seeing in design:

Pre-AI Design Cycle:

  • User research & problem framing: 1 week
  • Concept exploration: 1 week
  • Detailed design: 1 week
  • Design review & refinement: 3 days
  • Eng handoff & support: ongoing
  • Total: ~3.5 weeks to design-approved

Post-AI Design Cycle:

  • User research & problem framing: 1 week (unchanged - can’t AI this)
  • Concept exploration: 3 days (faster with AI!)
  • Detailed design: 4 days (faster!)
  • Design review & refinement: 6 days (slower - more context needed)
  • Revision cycles: 3 days (new cost - AI enables cheap iteration)
  • Eng handoff & support: more time needed (context gaps)
  • User testing reveals issues: 2 days rework
  • Total: ~3.5 weeks (same!)

We saved time in creation, lost it in review, revision, and context-building.

The Product-Design Alignment Challenge

I’ve had conversations with our product team that mirror what you described:

Product: “Design is using AI tools now, right? Can we accelerate the roadmap?”
Design: “We can create designs faster, but that’s not the bottleneck.”
Product: “What is the bottleneck then?”
Design: “Understanding what to design, ensuring it solves the right problem, and building shared context with engineering.”

AI tools don’t help with those things. They help with execution, but execution was never really the constraint in thoughtful design work.

Quality Concerns from Product

Our product team has raised concerns similar to your customer feedback:

  • “Features look polished but feel generic”
  • “Missing the attention to detail we used to have”
  • “Edge cases not considered”
  • “Accessibility issues we didn’t used to ship”

These are symptoms of optimizing for speed over depth of thinking.

The Metrics We Should Track

Your shift to outcome-focused metrics is exactly right. For design, that means:

Old Metrics (output-focused):

  • Design concepts created
  • Figma files shipped
  • Design iterations

New Metrics (outcome-focused):

  • User satisfaction with new features
  • Usability issue rate in production
  • Accessibility compliance
  • Design-eng collaboration quality
  • Time from user need to shipped solution

The Cross-Functional Solution

Reading your post, I’m convinced this isn’t just a product-engineering alignment problem. It’s product-engineering-design alignment.

All three functions have been optimizing locally with AI tools:

  • Engineering: code generation speed
  • Design: concept generation speed
  • Product: feature spec speed

But the system hasn’t gotten faster because the integration points are where the real work happens:

  • Building shared understanding
  • Ensuring quality and completeness
  • Handling edge cases and real-world complexity

Suggestion: Involve Design in the Conversation

When you form your product-engineering working group, please include design. The AI tool dynamics and metric mismatches are hitting us too, and we need end-to-end alignment on what we’re optimizing for.

The goal should be: delivering user value, not generating artifacts (code, designs, specs) faster.

David, reading this from the engineering trenches, I have to admit: you’re right, and it’s a bit of a gut punch.

The Developer Perspective

I’m one of the engineers who felt genuinely more productive with AI tools. Copilot helps me write code faster. I’m shipping more PRs. I feel busy and effective.

But your analysis makes me realize: I’ve been optimizing for the wrong thing.

The Local vs Global Optimization

I’ve been measuring my productivity as “how much code can I write in a day.” With AI tools, that number went way up. I felt great about it.

What I wasn’t seeing:

  • The review burden I’m creating for seniors
  • The bugs that get caught in QA instead of during development
  • The rework cycles when my AI-generated code doesn’t quite fit our architecture
  • The context gaps that require extra meetings and Slack threads

From my local perspective, I’m more productive. From the system perspective, I’m creating bottlenecks downstream.

The “Waiting for Review” Frustration Makes Sense Now

I’ve been frustrated by long review times. “I can implement a feature in hours, why does review take days?”

Your analysis explains it: review is the new bottleneck because everyone is generating code faster. The review queue is constantly backed up because we’re all feeling productive and shipping PRs.

I was attributing the slow review to inefficiency in the review process. But actually, it’s the natural result of massively increasing the volume of code requiring review without increasing review capacity.

Optimizing for the Wrong Metrics

Your point about optimizing locally vs globally hits hard. I’ve been proud of:

  • My PR count
  • My commit frequency
  • My implementation speed

But none of those actually matter if the feature doesn’t reach customers faster or better.

What I’m Changing

Reading this thread, here’s what I’m committing to change:

  1. Self-Review Discipline: Just because AI wrote it fast doesn’t mean I can skip thorough self-review. I should spend the time I “saved” in implementation doing careful self-review.

  2. Understanding Over Speed: If I don’t deeply understand the AI-generated code, I shouldn’t ship it. The question isn’t “does this work?” but “do I understand this well enough to maintain it?”

  3. Respecting Review Capacity: Before creating a PR, ask: is this actually ready for someone’s limited review time? Or should I refine it more?

  4. Full Cycle Thinking: When estimating work, include review time, likely revision cycles, and QA in my mental model. “I can code this in 3 hours” isn’t the same as “this feature will be done in 3 hours.”

  5. Quality Over Quantity: Stop celebrating PR count. Start celebrating features that reach customers without rework.

The Cultural Shift Needed

For this to work at team level, we need cultural change:

Instead of celebrating: “I shipped 12 PRs this week!”
Celebrate: “My PRs made it through review with minimal iterations because I did thorough self-review.”

Instead of: “Look how fast I implemented this!”
Say: “This feature is ready for customers - tested, reviewed, and polished.”

Instead of: “AI tools made me 2x faster!”
Recognize: “AI tools changed where I spend my time - less writing, more reviewing and refining.”

To Product Leaders

If you’re seeing this disconnect, know that at least some of us engineers want to close it. We got excited about the wrong metrics, but we’re willing to refocus on what actually matters.

Help us by:

  • Providing feedback on full cycle time, not just implementation speed
  • Sharing customer perspectives on quality and delivery
  • Celebrating outcomes over output
  • Supporting process changes that improve end-to-end delivery even if they slow local velocity

The Reality Check

This whole thread has been a reality check. AI tools are powerful, but we’ve been measuring their impact wrong. If the code I generate quickly just creates bottlenecks and rework downstream, I’m not actually being more productive - I’m just moving the problems around.

Time to focus on the full cycle and actual customer value.