Experienced Developers Took 19% Longer With AI Tools — But Believed They Were Faster Even After Seeing the Data

A controlled study is challenging every assumption the industry holds about AI coding productivity, and the results are uncomfortable.

Experienced developers — people with years of professional software engineering experience — completed code maintenance tasks 19% slower when using AI assistance compared to working without it. That alone would be noteworthy. But here’s the finding that should genuinely concern engineering leaders: even after being shown the objective timing data, most developers still believed the AI had made them faster.

The Study Design

The methodology was rigorous. Developers were given realistic maintenance tasks — bug fixes, refactoring, feature additions — on a real codebase they were already familiar with. This is critical: these weren’t toy problems or unfamiliar repositories. These were the kinds of tasks that make up the majority of actual software engineering work.

Participants were split into two groups. Half used their preferred AI coding tools (Copilot, Cursor, ChatGPT — whatever they normally use). Half worked without AI assistance. Tasks were timed objectively. The experiment controlled for task difficulty, developer experience level, and codebase familiarity.

The result: the AI-assisted group was consistently slower on maintenance tasks. Not marginally — 19% slower on average, with some tasks showing even larger gaps.

Importantly, this finding is specific to maintenance work on familiar codebases. For greenfield coding, boilerplate generation, or working in unfamiliar languages, AI tools do provide measurable speedups. But maintenance — fixing bugs, refactoring, extending existing features — is where most professional software engineering time is spent.

Why AI Slows Down Experienced Developers on Maintenance Tasks

Four mechanisms explain the slowdown:

1. Context Switching Overhead. Developers spend significant time formulating prompts, evaluating AI suggestions, and course-correcting when the AI misunderstands the codebase context. Each interaction with the AI tool is a context switch away from the mental model the developer has built of the code. For an experienced developer who already understands the codebase, these interruptions add friction rather than removing it.

2. Over-Reliance Replacing Systematic Debugging. Instead of reading the code, tracing execution paths, and systematically narrowing down the bug, developers paste error messages into AI and iterate on AI-suggested fixes. For experienced developers who have efficient debugging strategies, this “AI-first” approach is actually slower than their established methods. The AI doesn’t know the codebase’s history, design decisions, or invariants — the developer does.

3. Scope Creep from AI Suggestions. AI tools often suggest “improvements” beyond the scope of the current task. A developer fixing a bug gets an AI suggestion that also refactors the surrounding code, updates naming conventions, and adds error handling. These suggestions aren’t wrong — they might even be good ideas — but they pull developers into rabbit holes that extend task completion time without contributing to the original objective.

4. Verification Overhead. AI-generated changes require careful review because the AI doesn’t understand the codebase’s invariants, edge cases, and implicit contracts. An experienced developer reading and modifying code themselves has high confidence in their changes. When reviewing AI-generated changes, they need to verify every line against their mental model of the system, which can take longer than just writing the code themselves.

The Perception Gap: The Most Dangerous Finding

The 19% slowdown is concerning, but the perception gap is genuinely alarming. Developers feel faster with AI tools even when objective measurements show they’re slower. This isn’t delusion — there are real psychological reasons for it:

  • AI tools reduce the feeling of being “stuck.” Even when iterating on wrong solutions, the developer feels productive because things are happening.
  • The cognitive effort feels lower, which the brain interprets as efficiency.
  • AI handles the tedious parts (boilerplate, syntax), creating a subjective experience of speed even when total task time increases.

CIO Magazine has called this “the AI productivity trap” — a situation where subjective productivity assessments diverge from objective measurements, leading organizations to make decisions based on how things feel rather than what the data shows.

Implications for Engineering Leadership

If developers feel faster but aren’t, teams will make systematically bad decisions:

  • Tooling investment based on satisfaction surveys rather than output metrics
  • Sprint velocity expectations inflated by perceived (not actual) productivity gains
  • Process changes that optimize for developer feelings rather than delivery outcomes
  • Hiring and capacity planning that assumes AI-augmented output levels that don’t materialize

You cannot rely on developer surveys to measure AI tool ROI. You need objective metrics: time to resolution, defect rates post-deployment, lines of code that survive to production versus churn, and actual sprint completion rates.

What I’m Doing About It

I’m running a 3-month internal study comparing teams with and without AI tools on matched projects. Matched on complexity, team experience, and codebase maturity. Measuring actual outcomes:

  • Task completion time (objective, not self-reported)
  • Defect escape rate (bugs that reach production)
  • Code churn (percentage of AI-generated code that gets rewritten within 30 days)
  • Sprint completion rate (stories completed vs. committed)

I’ll share results when we have them. But I’m already skeptical of the “10x developer” narrative that’s dominating conference stages.

Has anyone else measured AI tool impact with objective data rather than surveys? I’d love to compare methodologies and findings. The industry needs more rigorous measurement and less anecdote-driven hype.

This aligns with what I’ve observed anecdotally but was honestly afraid to say out loud. Thank you, Michelle, for putting data behind the uncomfortable truth.

Our team adopted Cursor about eight months ago with enormous enthusiasm. Every developer on the team reports loving it. Satisfaction scores for developer tooling went through the roof. In our quarterly surveys, 90% of engineers said AI tools made them “significantly more productive.” If I’d stopped at survey data, I’d be on stage at the next engineering leadership conference talking about our incredible AI transformation.

But I didn’t stop at surveys. I looked at our DORA metrics before and after AI tool adoption, and the story they tell is very different:

  • Deployment frequency: Flat. No change. We’re shipping at the same cadence as before AI tools.
  • Lead time for changes: Increased slightly — about 8% longer from commit to production. Not catastrophic, but directionally wrong.
  • Change failure rate: Up 15%. This is the most concerning metric. We’re seeing more bugs escape to production since AI adoption. My hypothesis: developers are reviewing AI-generated code less carefully than code they wrote themselves, because there’s an implicit trust in “the AI suggested it.”
  • Mean time to recovery: Roughly flat, maybe slightly worse.

The only metric that “improved” was lines of code per pull request — which went up significantly. But more code per PR is not a good thing. It means PRs are larger, harder to review, and more likely to contain subtle issues. The AI makes it easy to generate large changes, but that doesn’t mean those changes are better.

I haven’t shared this data broadly within the organization because — and I’ll be candid — questioning AI tool productivity feels like career suicide right now. Every CTO and VP of Engineering is on stage saying AI makes their teams 10x faster. VCs are funding companies based on “AI-augmented engineering” narratives. If I stand up and say “actually, our objective metrics got worse,” I look like a Luddite who doesn’t understand the future.

But the data is the data. And Michelle’s study results give me confidence that what we’re seeing isn’t an anomaly specific to our team.

I think the issue is particularly acute for mature codebases. Our main product is 7 years old, with hundreds of thousands of lines of code, complex business logic, and deep institutional knowledge embedded in the architecture. AI tools don’t have that context. They generate plausible-looking code that often violates assumptions that aren’t documented anywhere — they’re in the heads of senior engineers who’ve worked on the system for years.

For teams working on greenfield projects or younger codebases, the metrics might tell a different story. But most enterprise engineering is maintenance, not greenfield. We need to stop pretending otherwise.

I’d love to participate in comparing methodologies, Michelle. DM me if you want to set up a cross-company working group on objective AI productivity measurement.

The methodology matters enormously here, and I want to push back slightly on the “AI doesn’t help” narrative before it takes on a life of its own. The nuance is critical.

The study focused on maintenance tasks on familiar codebases — exactly the scenario where experienced developers have deep context that AI fundamentally lacks. When you already know where the bug is likely hiding, when you understand the system’s invariants intuitively, when you can trace execution paths in your head — adding an AI intermediary creates friction rather than removing it. The AI is essentially a less-informed colleague suggesting solutions to a problem you already understand better than it does.

But software engineering isn’t just maintenance on familiar code. Let me break down where the research actually shows AI does provide genuine, measurable speedups:

  • Greenfield coding and prototyping: When building something new, AI dramatically reduces time-to-first-working-version. Developers report 30-50% speedups on greenfield tasks, and objective measurements tend to confirm this.
  • Boilerplate and scaffolding: Generating CRUD endpoints, test templates, configuration files, and repetitive patterns. AI excels here because the code is formulaic and context-independent.
  • Unfamiliar languages and frameworks: When a Python developer needs to write some Go, or a backend engineer needs frontend CSS, AI bridges the knowledge gap genuinely faster than reading documentation.
  • Documentation and test generation: Writing docstrings, generating unit test scaffolding, creating API documentation. These tasks are tedious for humans and well-suited to AI.

The nuance that’s completely missing from the “AI makes developers 10x faster” conference narrative is this: software engineering is mostly maintenance, not greenfield.

Studies consistently show that 60-80% of engineering time is spent on existing code — debugging, refactoring, extending features, handling edge cases, fixing integration issues. If AI helps with 20% of the work (new code, boilerplate) but slows down the other 80% (maintaining existing code), the net effect could be negative even though developers feel productive in the moments where AI is genuinely helpful.

We need to disaggregate productivity measurements along several dimensions:

  1. Task type: Greenfield vs. maintenance vs. debugging vs. code review
  2. Codebase familiarity: New codebase vs. 6+ months of experience
  3. Developer experience level: Junior vs. mid vs. senior (there’s evidence AI helps juniors more and hurts seniors more)
  4. Code complexity: Simple CRUD vs. complex business logic vs. distributed systems

The 19% slowdown is real and important, but it’s a finding about a specific type of work. The mistake would be either dismissing AI tools entirely based on this data, or ignoring this data because AI feels productive on other types of work.

Michelle, for your internal study, I’d strongly recommend tracking these dimensions separately rather than looking at aggregate productivity. The aggregate number might look flat or slightly negative while masking both genuine wins and genuine losses in different task categories. The actionable insight isn’t “use AI” or “don’t use AI” — it’s “use AI for X but not for Y,” and right now we don’t have good enough data to fill in X and Y with confidence.

The business implications of this study hit close to home, and I need to be honest about what I’m seeing on the product side.

We’ve been planning sprints assuming AI-assisted velocity for the past two quarters. When we adopted AI coding tools, leadership (including me) assumed we’d see productivity gains, so we shortened timelines, increased feature scope per sprint, and committed to more ambitious roadmap targets. The logic seemed sound: if developers are 20-30% faster with AI — as everyone in the industry was claiming — we should be able to deliver 20-30% more.

I just pulled our last quarter’s data after reading Michelle’s post, and the numbers are damning:

  • Sprint completion rate dropped from 78% to 65% after AI tool adoption, even though team satisfaction surveys said productivity improved.
  • Feature delivery timelines are averaging 12% longer than pre-AI estimates.
  • Bug reports in the first week post-release increased by roughly 20%.

That 13-point gap between perceived and actual delivery — 78% pre-AI completion vs. 65% post-AI completion — might be explained exactly by this study. We felt faster, so we committed to more. But we weren’t actually faster on the maintenance-heavy work that dominates our sprints, so we systematically over-committed.

The compounding effect is what really concerns me. When sprint completion drops, you carry over incomplete work. Carry-over accumulates across sprints. Teams feel behind, so they rely even more heavily on AI tools to try to catch up, which (if the study is right) actually slows them down further on maintenance tasks. It’s a negative feedback loop driven by a perception gap.

Here’s what I’m changing immediately:

I’m not going to reduce AI tool access. The developer satisfaction benefits are real and measurable. Engineers are happier, they feel more supported, and the tools genuinely help with certain types of work (Rachel’s breakdown of task types is spot-on). Taking away tools that developers love would be destructive to morale and retention.

But I am going to stop factoring “AI speedup” into sprint planning calculations. We’re going back to pre-AI velocity baselines for capacity planning. If AI tools provide genuine speedups on certain tasks, great — that’ll show up as teams completing sprints early, which is a much better outcome than consistently over-committing and carrying work forward.

I’m also adding “AI-appropriateness” as a factor in task breakdown. During sprint planning, we’ll tag tasks as “AI-friendly” (greenfield, boilerplate, scaffolding) or “AI-caution” (debugging existing code, complex refactoring, integration work). This will help set realistic expectations and guide developers on when to lean on AI versus when to trust their own expertise.

The broader takeaway for product leaders: be very careful about adjusting commitments based on tool adoption. Measure actual delivery before assuming productivity gains. The gap between perception and reality in AI-assisted development is real, and if you’re making roadmap promises based on perceived velocity, you’re setting your team up for a systematic delivery shortfall.

Michelle, I’d love to see the results of your 3-month study. I suspect the task-type segmentation Rachel described will be the key insight — and it might give us a framework for smarter AI tool deployment rather than the current “use AI for everything” approach.