The AI Productivity Honeymoon is Over: From 70% to 60% Sentiment in One Year

I’ve been digging into the latest AI developer productivity data, and the numbers tell a story that’s both fascinating and concerning. Developer sentiment toward AI tools has dropped from over 70% in 2023-2024 to roughly 60% in 2025. More striking: only 16.3% of developers say AI makes them “more productive to a great extent.”

That’s not a minor dip—that’s a trend reversal in one of the fastest technology adoptions in recent history.

The Perception vs Reality Gap

Here’s where it gets really interesting. A rigorous study by METR (Metrology for AI Systems Research) between February and June 2025 ran a randomized controlled trial with 16 experienced open-source developers working on real tasks from their own repositories.

The results? Developers using AI tools took 19% longer to complete their work. Yet they believed they worked 20% faster with AI. Even after seeing the actual results, developers still thought AI had sped them up.

Think about that for a second. We have a massive perception-reality gap. Developers feel more productive because AI reduces cognitive load and gives them confidence—but the measured output tells a different story.

The “Almost Right, But Not Quite” Problem

For those of us in product, this resonates deeply. It’s what I call the Uncanny Valley of Code—when AI solutions are close enough to seem helpful but require just enough correction to create friction.

45% of developers cite “AI solutions that are almost right, but not quite” as their #1 frustration. Only 29% of developers trust the accuracy of AI-generated code (down from 40% in prior years). And 46% of developers don’t fully trust AI outputs at all.

This isn’t a tooling problem—it’s a workflow problem. Teams are transitioning from a “creator” mindset to a “forensic auditor” mindset, spending more time verifying and correcting AI output than they would have spent writing code from scratch with full understanding.

The Organizational Disconnect

Here’s the part that keeps me up at night as a product leader: Individual developers report productivity gains, but organizations see flat delivery velocity.

  • 84% of developers use AI tools
  • AI now writes 41% of all code
  • Yet organizational productivity has stayed at 10% since AI tools launched

PwC’s 2026 CEO Survey found that 56% of CEOs saw neither cost decreases nor revenue increases from AI over the prior 12 months. Only 12% reported both kinds of gains.

Where are the productivity gains going? They’re being absorbed by:

  • Increased review time for AI-generated code
  • Technical debt accumulation from “good enough” implementations
  • Context switching between writing and auditing modes
  • Rework cycles when “almost right” solutions fail edge cases

Are We in the Trough of Disillusionment?

Gartner’s Hype Cycle shows Generative AI entering the “Trough of Disillusionment” in 2026. This isn’t a bad thing—it’s a necessary phase where we move from experimental excitement to practical implementation.

The honeymoon phase is ending. Now comes the hard work:

  • Defining where AI adds genuine value vs where it creates overhead
  • Building verification processes that catch AI mistakes without killing velocity
  • Training teams to use AI strategically rather than reflexively
  • Measuring actual delivery outcomes instead of individual task completion

The Real Question

For product and engineering leaders: How are you recalibrating your AI investment strategy?

Are we doubling down because we believe the perception gap will close? Are we pulling back because the ROI isn’t materializing? Or are we getting more surgical—identifying specific use cases where AI delivers measurable value and avoiding the “AI-first” trap?

I’m curious how other teams are navigating this shift. The data suggests we’re past the “AI solves everything” phase and into “AI is appropriate for specific contexts.”

What does that look like in your organization?


Sources: Panto AI Statistics, METR Study, MIT Tech Review, Faros AI Research

This hits so close to home, @product_david. I’ve been wrestling with this exact phenomenon on the design side.

We use AI tools for everything from generating design variations to writing microcopy, and I feel incredibly productive. But when I actually measure output—number of features shipped, design quality scores from user testing, time to final approval—the numbers are… not impressive.

The Creator to Forensic Auditor Shift

You nailed it with the “forensic auditor” framing. This is exactly what’s happening in design workflows too.

I used to sit down and create a component from scratch. I knew every decision, every edge case, every interaction state. Now I generate 5 variations with AI, spend 20 minutes evaluating which is “closest,” then spend another 30 minutes fixing the spacing, adjusting the accessibility, and correcting the design token references that the AI got wrong.

The cognitive experience feels easier—I’m not staring at a blank canvas. But the actual time and mental energy spent is often higher, and the end result has less thoughtful coherence because I’m stitching together AI suggestions rather than designing from first principles.

The “Almost Right” Uncanny Valley

Your “Uncanny Valley of Code” term is perfect. In design, we have the same issue: AI-generated layouts that look professional but violate our design system in subtle ways. Copy that sounds good but doesn’t match our voice guidelines. Icons that are close enough to our style but introduce visual inconsistency.

The problem is that “almost right” is actually worse than “completely wrong” because it’s harder to spot and tempting to ship. Completely wrong gets caught immediately. Almost right sneaks through code review and causes problems six months later when someone tries to extend the component.

Are We Measuring the Wrong Thing?

Here’s what I keep coming back to: maybe we’re optimizing for the wrong productivity metrics.

AI tools definitely reduce the friction of getting started. They lower the activation energy. They make me feel less stressed about a blank page or an empty codebase. That psychological benefit is real—it reduces burnout and makes work feel more manageable.

But if we’re measuring productivity as “features shipped per sprint” or “time to first commit,” we’re missing the quality dimension. We’re not accounting for:

  • Technical/design debt created by “good enough” AI suggestions
  • Review overhead to catch subtle mistakes
  • Rework cycles when edge cases fail
  • Lost learning opportunities when juniors never build from scratch

What if the real benefit of AI isn’t speed, but sustainable pace? What if it’s about reducing the emotional exhaustion of creative work, even if the actual output quantity stays the same?

I don’t have answers, just questions. But I do know that I’m getting a lot more careful about when I reach for AI tools versus when I force myself to think from first principles.

Curious how others are drawing those boundaries.

The trust number is what jumps out at me: only 29% of developers trust the accuracy of AI-generated code, down from 40%.

In financial services, that’s not a productivity problem—that’s a risk management problem.

The Compliance and Quality Challenge

We operate in an environment where code mistakes can mean regulatory violations, financial losses, or security breaches. “Almost right” isn’t an option when you’re handling customer financial data or implementing compliance controls.

Here’s what we’re seeing in practice:

Scenario 1: AI writes a database query

  • AI version: Works for the happy path, misses edge cases around null values
  • Human review catches it… sometimes
  • When it gets through? Production incident 3 months later when a customer record hits that edge case

Scenario 2: AI generates validation logic

  • AI version: Looks correct, passes initial testing
  • Subtle bug: Accepts inputs that should be rejected per regulatory requirements
  • Compliance audit finds the gap 6 months later
  • Now we have a regulatory issue and a trust problem with the auditors

The 19% slowdown that the METR study found? In our context, that’s actually optimistic. When you factor in the additional review cycles, the testing overhead, and the rework when “almost right” code causes production issues, the actual cost is much higher.

Changing Our Review Processes

We’ve had to adapt our engineering practices:

  1. AI-generated code gets flagged in PRs - We use a convention where devs mark which parts were AI-assisted so reviewers know to scrutinize more carefully

  2. Senior engineers spend more time in code review - The perception gap means juniors think AI code is correct when it’s not. We need experienced eyes to catch the subtle mistakes.

  3. Testing requirements are stricter for AI-assisted code - Edge case coverage, security scanning, compliance checks all get extra attention

  4. Some critical paths are AI-restricted - Authentication, authorization, financial calculations, audit logging—we’ve told the team to write these by hand

This isn’t anti-AI sentiment. It’s pragmatic risk management. The tools are useful for scaffolding, boilerplate, and exploratory work. But for production code in regulated environments, the verification overhead often exceeds the generation time savings.

The Real Question: Trust vs Velocity

@product_david, you asked how we’re recalibrating AI investment. For us, it’s about defining clear boundaries:

  • Use AI for: Test data generation, documentation, refactoring suggestions, learning new APIs
  • Avoid AI for: Security-critical code, compliance logic, financial calculations, anything that touches PII

The hard part is that this requires judgment and experience to know which category a task falls into. Junior engineers don’t always have that context. So we’re back to needing senior engineers to guide AI usage, which creates a bottleneck.

@maya_builds, your point about sustainable pace resonates. Maybe the real value is reducing the cognitive load of routine work so teams can focus energy on the high-stakes decisions. But that requires discipline to not over-rely on AI for the critical stuff.

How are other teams handling the trust gap in production environments?

This conversation captures the strategic inflection point we’re all navigating. The data is clear: individual productivity perception is diverging from organizational delivery reality.

The Executive Perspective: Where’s the ROI?

PwC’s finding that 56% of CEOs saw no cost reduction or revenue increase from AI is the conversation happening in every board room right now. Leadership invested in AI tools expecting measurable business impact. What they’re seeing instead is:

  • Engineering teams using AI extensively
  • Developers reporting higher productivity and satisfaction
  • Delivery velocity staying flat or declining
  • Quality issues emerging 3-6 months post-deployment
  • Tech debt accumulation requiring rework sprints

From a CTO lens, this creates a difficult position. The tools clearly provide some value—developers want to use them, and blocking access would hurt morale and recruitment. But the business case that justified the investment isn’t materializing.

AI-Induced Technical Debt

@maya_builds and @eng_director_luis both touched on this, but it deserves emphasis: we’re creating a new category of technical debt.

Traditional tech debt comes from conscious trade-offs: “We’ll ship the MVP with this architectural limitation and refactor later.” There’s awareness and intentionality.

AI-induced tech debt is different. It’s accumulating invisibly because:

  1. Subtle incorrectness - Code that works in testing but fails edge cases in production
  2. Pattern inconsistency - AI generates solutions that don’t align with existing architecture
  3. Lost context - Developers don’t fully understand the code they’re shipping, making future modifications harder
  4. Review fatigue - Teams can’t sustain the scrutiny needed to catch all AI mistakes

The compounding effect is that we’re accumulating tech debt faster than we realize, and the cost will hit us later when we need to modify or extend these systems.

From “AI-First” to “AI-Appropriate”

@product_david asked how we’re recalibrating. Here’s our strategic shift:

Phase 1 (2024-2025): AI Everywhere

  • “Use AI for everything” cultural push
  • Focus on adoption metrics
  • Expectation of 20-30% productivity gains

Phase 2 (2026): AI With Discipline

  • Define clear use cases where AI adds value
  • Implement quality gates and review processes
  • Measure actual delivery outcomes, not just task completion
  • Accept that some work is better done by humans

We’re moving from “maximize AI usage” to “optimize AI placement.”

Specific strategic questions we’re asking:

  1. Where does AI reduce bottlenecks vs create new ones?

    • Good: Boilerplate, test scaffolding, documentation
    • Bad: Complex business logic, security-critical paths, novel architecture
  2. What’s the true cost accounting?

    • Generation time saved
    • MINUS: Review overhead
    • MINUS: Rework cycles when AI code fails
    • MINUS: Lost learning for junior developers
    • MINUS: Future modification difficulty
  3. How do we measure real productivity?

    • Not: Lines of code written, PRs merged, tasks completed
    • Yes: Features delivered to production, customer value shipped, system reliability maintained

The Leadership Challenge

The hardest part is the perception gap that the METR study exposed. Developers genuinely believe AI makes them faster, even when measured data shows they’re slower.

This creates a leadership challenge:

  • Do we trust the subjective experience (“AI makes me feel more productive”)?
  • Or do we trust the objective measurement (“delivery velocity is flat”)?

My view: both are real, and the disconnect is the problem we need to solve.

AI reduces the psychological burden of starting tasks and provides confidence. That’s valuable for morale and retention. But if it’s not translating to business outcomes, we need to understand why and adjust our approach.

Maybe, as @maya_builds suggested, the real benefit is sustainable pace rather than increased speed. If AI prevents burnout and keeps teams engaged without increasing output, that might be worth the investment.

But we need to be honest about what we’re paying for and set realistic expectations with the business.

Moving Forward

I’m landing on a framework: AI as an enabler, not an accelerator.

It enables teams to work at a sustainable pace on a broader range of tasks. It doesn’t necessarily accelerate delivery, and expecting it to creates disappointment when the velocity gains don’t appear.

The organizations that will succeed are those that:

  1. Define clear guidelines for when AI is appropriate
  2. Invest in quality processes that catch AI mistakes
  3. Measure real delivery outcomes, not activity metrics
  4. Set realistic expectations about ROI timelines

We’re one year into the Trough of Disillusionment. The question is whether we emerge with sustainable AI practices or abandon the tools when they don’t deliver on the hype.

How are others framing AI value to leadership when the productivity gains aren’t showing up in the data?

This thread is hitting on something that keeps me up at night from a people and culture perspective: the perception gap isn’t just a measurement problem—it’s eroding developers’ ability to self-assess.

The Skill Development Crisis

The data @product_david cited about only 16.3% feeling “more productive to a great extent” is one thing. But there’s another stat that’s even more concerning from a talent development perspective:

Junior developers using AI tools are scoring 17% lower on skill mastery assessments.

When developers believe they’re 20% faster but are actually 19% slower, and juniors are developing weaker foundational skills, we have a compound problem:

  1. Self-assessment accuracy is declining - If you can’t accurately judge your own productivity, how do you know what to improve?

  2. Skill gaps are hidden - Junior devs think they’re learning because they’re shipping code, but they’re not building the deep understanding needed for complex problems

  3. Career progression becomes harder - When you advance to senior roles, you need to solve novel problems AI can’t handle. But if you learned to code with AI rather than learning fundamentals then using AI, you lack the foundation.

Team Dynamics in the Disillusionment Phase

Here’s what I’m seeing play out on my teams:

The Believers - Developers who love AI tools and use them for everything. They report high satisfaction but their code quality is inconsistent. They get defensive when we flag AI-generated code in reviews.

The Skeptics - Developers who refuse to use AI or use it minimally. They’re frustrated that AI users get credit for “shipping fast” even when the code needs more review cycles. They feel penalized for taking time to write well-considered code.

The Pragmatists - Developers who’ve figured out the boundaries @eng_director_luis described. They use AI strategically and honestly assess when it helps vs hurts. These folks are rare, and they’re usually senior.

The challenge is that these three groups create tension. The Believers think the Skeptics are behind the times. The Skeptics think the Believers are cutting corners. And the Pragmatists are stuck mediating.

The Leadership Challenge

@cto_michelle, your framework of “AI as enabler, not accelerator” resonates. But here’s the people challenge that goes with it:

We hired developers during the AI hype cycle. Many of them:

  • Joined expecting AI to make their jobs easier
  • Built their workflows around AI tools
  • Developed habits of reaching for AI first
  • Never learned the “think from first principles” muscle

Now we’re entering the phase where we need discipline and boundaries around AI usage. But we have a cohort of developers who fundamentally learned to code in an AI-assisted environment.

How do we re-skill them without making them feel like we’re pulling the rug out from under them?

The Burnout Paradox

@maya_builds, you mentioned sustainable pace, and I think this is critical. Here’s the paradox I’m navigating:

AI tools reduce cognitive load and make developers feel less stressed. That’s real value for burnout prevention. But if the code they’re shipping requires more review, creates tech debt, and fails in production, we’re just shifting the stress from development to operations and debugging.

So we haven’t eliminated burnout—we’ve redistributed it.

The on-call engineers dealing with production issues from “almost right” AI code are burning out. The senior engineers spending more time in code review are burning out. The tech leads managing the tech debt backlog are burning out.

Meanwhile, the developers using AI feel great because they’re shipping fast and getting positive dopamine hits from task completion.

What We’re Doing About It

Here’s our approach to navigating this cultural shift:

  1. Explicit AI guidelines tied to experience level

    • Juniors: Required to write critical paths by hand, AI for tests/docs only
    • Mid-level: AI for boilerplate, human review required for logic
    • Senior: Trusted to make strategic decisions about AI usage
  2. Skill development programs

    • Monthly “fundamentals” sessions where we solve problems without AI
    • Pair programming with AI-off time
    • Code review focused on understanding, not just correctness
  3. Psychological safety for honest self-assessment

    • Normalize saying “I used AI and I’m not sure if it’s right”
    • Celebrate catching your own AI mistakes
    • Make it safe to ask for help understanding AI-generated code
  4. Measuring what actually matters

    • Not: PRs merged, tasks completed
    • Yes: Production stability, customer-impacting bugs, peer review quality scores
  5. Honest conversations with the team

    • “AI is a tool, not magic”
    • “Velocity without quality isn’t velocity”
    • “Your career growth depends on deep understanding, not just output”

The Question I’m Wrestling With

Here’s what I don’t have an answer for yet:

If AI is making developers feel more productive and reducing their stress, even though it’s not actually making them faster or the org more effective, is that still valuable?

Retention is hard. Burnout is real. Developer happiness matters for recruiting and culture. If AI tools improve morale without improving output, is that worth the investment and the cultural complexity it creates?

Or are we just delaying a reckoning where the tech debt and skill gaps catch up with us?

I honestly don’t know. What I do know is that we can’t keep pretending the perception gap doesn’t exist. We need to help our teams develop accurate self-assessment, set realistic expectations about AI capabilities, and build cultures where quality and understanding matter as much as velocity.

How are other engineering leaders handling the morale vs reality trade-off?