We're coding faster but shipping slower—is AI productivity theater? 🎭

TL;DR: Everyone on my team is using AI coding tools and swears they’re 2x faster. Our sprint velocity? Basically flat. What gives? :thinking:


I need to vent, and maybe get a reality check from this community.

Three months ago, our engineering team went all-in on AI coding assistants. GitHub Copilot for everyone. Cursor subscriptions. The works. The engineers were thrilled—finally, tooling that keeps up with their brains!

Fast forward to today: In our retros, devs report saving 3-4 hours per week. They’re completing tickets faster. Code review requests are flying. Our Slack is full of “look what AI generated!” screenshots.

But here’s the thing that’s keeping me up at night… our actual delivery velocity hasn’t budged.

We’re shipping roughly the same number of features per sprint as we did before AI. Our design-to-production cycle time? If anything, it’s slightly longer. And our engineering manager is scratching his head because the math just doesn’t add up.

The Numbers Don’t Lie (But They’re Confusing)

I went down a research rabbit hole this week, and wow—we’re not alone:

  • 84% of developers now use AI coding tools (source)
  • Organizations see only 10% productivity gains despite this massive adoption (source)
  • Developers report saving hours, but review time increased 91% while tasks completed rose just 21% (Faros AI research)

That last stat hit me like a truck. 91% increase in review time. That’s our bottleneck right there.

What I’m Seeing From the Design Side :artist_palette:

Here’s my perspective as someone who bridges design and engineering:

Our engineers are cranking out code faster—that part is real. But the PRs are… how do I say this kindly… inconsistent. Some are brilliant. Others feel like they were written by someone (something?) that doesn’t quite understand the product context.

Code review has become this weird quality gate that’s more intense than before. Senior engineers are spending MORE time reviewing, not less, because they have to verify that the AI-generated code actually does what it claims.

Meanwhile, I’m sitting in design reviews wondering: Are we building the right features faster, or just building more features? Because from where I sit, we’re generating a lot of output, but I’m not sure it’s translating to better user experiences.

The Trust Gap That Nobody Talks About

Here’s what really bothers me: I’ve noticed devs using tools they don’t fully trust because they feel like they have to. Like, if you’re not using AI, you’re falling behind.

One of our junior engineers told me privately that Copilot sometimes slows him down, but he’s worried that admitting that makes him look “resistant to innovation.” That broke my heart a little. :broken_heart:

The data backs this up: trust in AI coding tools dropped from 40% to 29% while adoption rose to 84% (source). That’s a 55-point gap between usage and confidence!

Are We Measuring the Wrong Things?

I keep coming back to this question: What does “productivity” even mean?

  • Is it PRs merged per week?
  • Features shipped per sprint?
  • Customer problems solved?
  • Business outcomes delivered?

Maybe individual developer speed just isn’t the right metric when we’re building complex products as a team. Maybe the bottleneck was never “how fast can one person write code”—it was always coordination, communication, and making sure we’re building the right thing.

What I Want to Know :woman_raising_hand:

For the engineering leaders here:

  1. Are you seeing the same disconnect between individual speed and team velocity?
  2. How are you measuring AI impact beyond “hours saved”?
  3. What changed in your processes to actually capture the gains?
  4. How do you create space for people to honestly say “this tool isn’t helping me right now”?

For the product folks:

  1. Are we optimizing for output or outcomes? Does it matter if we ship features twice as fast if they don’t move metrics?

I want to believe AI tools can genuinely transform how we work—I’ve seen glimpses of it. But right now, it feels like we’re coding faster but shipping slower, and I can’t figure out if that’s a tooling problem, a process problem, or a measurement problem.

Or maybe it’s all three? :performing_arts:

Curious if anyone else is experiencing this productivity paradox, or if we’re just doing it wrong.

Maya, this resonates deeply. Your review bottleneck observation is exactly what we’re experiencing in our fintech org.

Here’s our data from Q1:

  • Pull requests created: +42% vs. Q4 2025
  • Deployment frequency: Essentially unchanged (1.2 deploys/day → 1.3 deploys/day)
  • Mean time to merge: +38% (was 2.1 days, now 2.9 days)
  • Post-deployment defects: +15%

So we’re generating WAY more code, but we’re not shipping faster—and when we do ship, we’re finding more bugs in production.

The Hidden Cost: Technical Debt Accumulation

What worries me most isn’t the velocity plateau—it’s what’s happening to our codebase quality. AI-generated code often works but doesn’t necessarily align with our architectural patterns.

Example: Last month, an engineer used Copilot to build a new API endpoint. The code was functional, passed tests, and shipped. Two weeks later, we discovered it bypassed our standard authentication middleware because the AI didn’t understand our security framework. :grimacing:

We’re finding that AI tools are incredible at generating syntactically correct code, but they don’t inherently understand:

  • Our team’s architectural decisions
  • Domain-specific compliance requirements (huge in fintech)
  • Performance characteristics at our scale
  • Long-term maintainability tradeoffs

Rethinking Code Review for the AI Era

Your question about process changes is critical. Here’s what we’re experimenting with:

What’s not working:

  • Treating AI-generated code the same as human-written code in reviews
  • Assuming senior engineers can review 2x as many PRs just because junior devs write 2x faster

What’s showing promise:

  • Explicit tagging: Requiring devs to mark which parts were AI-generated
  • Focused reviews: Spending more time on architecture/design, less on syntax
  • Pair programming with AI: Having a human partner with the AI-using dev in real-time
  • Better automated checks: Investing in static analysis that catches architectural violations

The Real Question: Are We Using AI for the Right Tasks?

I’m starting to think the problem is how we’re deploying these tools. AI coding assistants excel at:

  • Boilerplate generation
  • Test case creation
  • Code translation between languages
  • Documentation from code

They struggle with:

  • Architectural decisions
  • Domain-specific business logic
  • Security-critical code
  • Performance optimization in complex systems

Maybe the issue isn’t “should we use AI” but “which tasks should AI handle vs. which require human judgment?”

Your point about measuring the wrong things hits home. We’ve been tracking individual output, but what we should measure is:

  • System quality: defect rates, security incidents, performance degradation
  • Team effectiveness: cycle time from idea to customer value
  • Architectural health: technical debt accumulation, maintainability scores

Question for the thread: Has anyone successfully implemented review processes that actually work with high-volume AI-generated PRs? We’re drowning over here.

Maya and Luis—you’re both describing symptoms of a deeper strategic misalignment I’m seeing across the industry.

Let me share the uncomfortable truth from the C-suite perspective: Engineering leaders are celebrating individual developer productivity gains while business leaders see flat revenue per engineer metrics. We’re speaking different languages.

Three Levels of “Productivity”

The disconnect happens because we’re measuring at different layers:

1. Individual Level (what engineers feel)

  • “I wrote this function in 5 minutes instead of 30”
  • “I generated unit tests instantly”
  • Personal velocity ↑ = happiness ↑

2. Team Level (what engineering managers measure)

  • Sprint velocity, story points, PR throughput
  • Here’s where the paradox lives—more output, same delivery cadence
  • Why? Coordination costs, review bottlenecks, integration complexity

3. Business Level (what the board cares about)

  • Revenue per engineer
  • Time to market for revenue-generating features
  • Customer acquisition cost, retention, NPS
  • This hasn’t moved for most companies

At my company, we rolled out AI coding tools to our 50-person engineering team last year. Our CFO’s question six months later: “We spent $120K on AI tools. Show me the business impact.”

I couldn’t. Because we were measuring the wrong things.

The Junior Developer Conundrum

Here’s what we’re seeing: AI tools disproportionately help junior developers write code faster. This sounds great until you realize:

  • Senior engineer review time doubled (Luis, your data confirms this)
  • Architectural decisions still bottlenecked on senior judgment
  • Domain knowledge gaps weren’t filled by AI—they were masked

So we moved the bottleneck from “writing code” to “ensuring code quality and architectural fit.” In complex systems, the latter is actually MORE expensive.

One of our principal engineers told me: “I’m spending less time writing code and more time being a quality gate. This isn’t the productivity gain we promised.”

Business Outcomes vs. Engineering Metrics

Maya, your question “Are we optimizing for output or outcomes?” is THE question.

Example from our Q4: Engineering shipped 40% more features. Sounds amazing, right? Except:

  • Customer engagement with new features: flat
  • Support tickets: up 25%
  • Technical debt: accumulating faster

We shipped more stuff, but not more value.

The features AI helped us build faster weren’t necessarily the high-impact ones customers needed. Because AI accelerates coding, not product judgment. And we didn’t strengthen our product discovery or prioritization—we just made our build phase faster.

It’s like having a faster oven but not improving your recipes.

What Actually Works: Governance Frameworks

After a painful six months, here’s what we’re implementing:

1. Value-stream mapping

  • Measure cycle time from customer problem → validated solution
  • Identify where AI actually accelerates this (hint: less than you think)

2. Quality gates specifically for AI-generated code

  • Automated security scanning
  • Architectural compliance checks
  • Performance regression testing
  • Senior engineer “spot checks” on 30% of AI-generated PRs

3. Selective adoption, not blanket deployment

  • AI for test generation, documentation, routine refactoring
  • Human-led for new features, architectural changes, security-critical code
  • Measure both where we use AND don’t use AI

4. Different success metrics

  • Not: “hours saved”
  • Instead: “customer value delivered per sprint”
  • Track: feature adoption rates, customer satisfaction, defect density

The Uncomfortable Recommendation

To Luis’s point about being overwhelmed with reviews: Don’t try to review everything at the same pace.

Some code deserves deep review (security, architecture, customer-facing features). Some code is low-risk and can move faster (tests, docs, internal tools).

AI might 2x your output. But you can’t 2x your quality assurance capacity. So be strategic about what gets built and how thoroughly it gets reviewed.

The goal isn’t to ship faster. The goal is to deliver customer value more efficiently.

And right now, AI is helping us ship faster, but organizational structures and processes haven’t adapted to convert that speed into value.

This thread is surfacing something that’s been bothering me for months, and I haven’t had the words for it until now. Thank you all for the honesty.

The Psychological Safety Crisis

Maya, your story about the junior engineer who’s afraid to admit AI slows him down? That’s the real productivity killer.

Here’s what I’m seeing across our EdTech org: People are using tools they don’t trust because they’re terrified of being perceived as “resistant to innovation” or “not keeping up.”

The data is stark:

  • 84% of developers use AI tools
  • Only 29% trust their accuracy

That’s a 55-point gap between usage and confidence. Think about what that means: More than half of people using AI coding tools are doing so while doubting the output.

Would we accept this in any other domain? “84% of surgeons use this scalpel, but only 29% trust it won’t slip”? We’d call that malpractice.

The Pressure to Perform

I’ve had three separate 1:1 conversations in the past month where engineers confessed:

Engineer 1 (Senior): “My manager asked why my PR count was lower than my peers. I had to explain I was reviewing everyone else’s AI-generated code. He suggested I use AI more so I could contribute AND review.”

Engineer 2 (Mid-level): “I spent an hour debugging AI-generated code that would have taken me 20 minutes to write correctly from scratch. But I felt like I should make AI work.”

Engineer 3 (Junior): “I’m afraid if I don’t learn to use these tools, I’ll be unemployable in two years. So I force myself to use them even when they’re confusing.”

This isn’t productivity. This is productivity theater driven by FOMO and career anxiety.

What We Changed: Creating Space for Honesty

Michelle’s governance framework resonates, but I want to add the culture layer that makes it work.

Here’s what we implemented:

1. “AI Transparency” in retros

  • Explicitly ask: “When did AI help this sprint? When did it hurt?”
  • Celebrate wins AND admit failures
  • No judgment for choosing not to use AI for certain tasks

2. Explicit permission structure

  • “Use AI for: boilerplate, tests, docs” (greenlight)
  • “Use AI cautiously for: business logic, integrations” (yellow light)
  • “Don’t use AI for: security-critical code, architectural decisions” (red light)
  • Makes it psychologically safe to opt out

3. Redefine “AI-savvy”

  • Old definition: “Uses AI for everything”
  • New definition: “Knows when to use AI and when not to”
  • Skill is in judgment, not adoption rate

4. Measure satisfaction alongside adoption

  • Monthly pulse: “Do AI tools make your work better or just different?”
  • Track sentiment, not just usage
  • If satisfaction drops, we pause and investigate

Process Changes Need Psychological Safety

Luis, your experiments with tagging and focused reviews are smart. Michelle, your governance framework is exactly what’s needed strategically.

But here’s my addition: None of that works without teams feeling safe to say “this isn’t working.”

If engineers are afraid to push back on AI-generated code in reviews because it might seem like they’re “slowing things down”…

If junior devs won’t ask for help understanding AI suggestions because they think they should just “get it”…

If seniors won’t admit they’re overwhelmed by review volume because leadership is celebrating velocity gains…

…then no framework will save you. The culture eats the process for breakfast.

The Timeline Reality

One more uncomfortable truth: This takes time to get right.

We’re six months into our “AI adoption with intention” approach, and we’re just starting to see it click. Early results:

  • Team satisfaction with AI tools: 42% → 68%
  • PR review backlog: Down 30%
  • Post-deployment defects: Down 15%
  • Engineers reporting “AI helped me this week”: Holding steady at 70% (vs. 85% forced adoption)

The key shift? We stopped measuring AI adoption rate as a success metric.

Instead, we measure: “Are engineers building better software more sustainably?” Sometimes AI helps with that. Sometimes it doesn’t. And that’s okay.

My Questions for the Group

  1. How do you create culture where it’s safe to say “AI made this worse”?
  2. What signals tell you when adoption is genuine vs. performative?
  3. Has anyone successfully integrated AI tools without creating anxiety about job security?

Maya, to your original question about whether this is tooling, process, or measurement—I think it’s culture. The tools are just tools. The process only works if people trust it. And we measure what we value.

Right now, I worry we’re valuing speed over sustainability. And that’s a recipe for burnout, not productivity.

Coming at this from the product side, and honestly, this thread is making me question everything about how we’ve been thinking about AI “productivity.”

The Question Nobody Wants to Ask

Here it is: Are we building the right things faster, or the wrong things faster?

Because from where I sit as VP Product, I’m seeing a lot of velocity on features that don’t move our core metrics. And AI might actually be making that problem worse, not better.

A Cautionary Tale from Q4

Last quarter, our engineering team was on fire. They’d just adopted Copilot, spirits were high, and we shipped 60% more features than Q3. I presented this to our board as a huge win.

Then our VP of Customer Success asked: “Which of those features increased activation? Or retention?”

Silence.

Turns out:

  • Customer engagement with new features: Down 8%
  • Support tickets: Up 22%
  • Feature adoption (% of users who use a feature within 30 days): Down from 45% to 31%

We built more. Customers wanted less. And we completely missed it because we were celebrating engineering throughput instead of customer outcomes.

The Real Bottleneck Isn’t Code

Maya, Luis, Michelle, Keisha—you’re all describing symptoms of something I see from the product side:

The bottleneck was never “how fast can we write code.” It was always “do we understand what to build?”

AI coding tools make implementation faster. But they do nothing for:

  • Product discovery
  • User research
  • Prioritization
  • Strategic alignment
  • Understanding customer jobs-to-be-done

If anything, faster implementation can be dangerous when we haven’t validated the right thing to build.

It’s like Michelle said: we got a faster oven but didn’t improve our recipes. Except from my perspective, we’re cooking the wrong dishes entirely.

The Product-Engineering Misalignment

Here’s what I’m seeing in product-engineering collaboration:

Before AI:

  • Engineers: “This will take 2 weeks”
  • Product: “Okay, let’s make sure it’s the right thing to build”
  • Natural forcing function for thoughtful prioritization

After AI:

  • Engineers: “I can knock this out in 2 days with AI”
  • Product: “Great, let’s do it!”
  • We skip the hard questions about whether we should

The speed creates a false urgency. We’re optimizing for “ship it fast” when we should optimize for “learn fast, then ship the right thing.”

What We Changed: Product-Engineering Joint Accountability

After our Q4 wake-up call, we restructured how product and engineering collaborate on AI-assisted development:

1. Definition of “Done” includes customer validation

  • Not: “Feature shipped”
  • Now: “Feature shipped AND validates hypothesis”
  • Measure feature adoption, not just deployment

2. Slow down discovery, speed up iteration

  • Spend more time on customer research and problem definition
  • Use AI to rapidly prototype and test multiple solutions
  • Validate before committing to full build

3. Joint success metrics

  • Product + Engineering co-own: customer satisfaction, feature adoption, business impact
  • Decoupled from: PR velocity, story points, lines of code

4. “Should we build this?” checklist

  • Does this solve a validated customer problem?
  • Have we talked to at least 5 customers about this?
  • What metric will this move, and by how much?
  • Required BEFORE engineering starts (AI or not)

The Uncomfortable Question About Productivity

Luis asked: “Are we using AI for the right tasks?”

From the product side, I’d broaden that: Are we using our engineering capacity for the right things, period?

AI can make engineers 2x faster at building features. But if those features:

  • Don’t solve real customer problems
  • Create more complexity in the product
  • Increase support burden
  • Don’t move business metrics

…then 2x faster just means we’re digging the wrong hole more efficiently.

Keisha’s point about psychological safety is crucial here too. I need product managers and engineers to feel safe saying:

“We could build this fast with AI, but should we build it at all?”

“This feature shipped quickly, but customers aren’t using it—what did we miss?”

“We’re measuring velocity, but are we delivering value?”

Proposed Framework: Outcome-Driven AI Adoption

What if we flipped the measurement model?

Instead of:
“AI helped us ship 40% more features”

Measure:
“AI helped us validate and deliver features that increased [customer metric] by X%”

Instead of:
“Engineers save 3 hours/week with AI”

Measure:
“Product-engineering cycle time from customer problem to validated solution improved by Y%”

The goal isn’t more output. The goal is better outcomes.

Questions for This Community

  1. For engineering leaders: How do you resist pressure to “just ship it fast” when AI makes that possible?

  2. For product leaders: How are you adjusting discovery/validation processes in the age of AI-accelerated development?

  3. For everyone: What metrics actually matter for measuring AI productivity at the organizational level?

This thread has been eye-opening. The productivity paradox isn’t just an engineering problem—it’s a whole organization alignment problem.

And honestly? I think we’re all optimizing for the wrong thing. Speed is empty without direction.