A controlled study is challenging every assumption the industry holds about AI coding productivity, and the results are uncomfortable.
Experienced developers — people with years of professional software engineering experience — completed code maintenance tasks 19% slower when using AI assistance compared to working without it. That alone would be noteworthy. But here’s the finding that should genuinely concern engineering leaders: even after being shown the objective timing data, most developers still believed the AI had made them faster.
The Study Design
The methodology was rigorous. Developers were given realistic maintenance tasks — bug fixes, refactoring, feature additions — on a real codebase they were already familiar with. This is critical: these weren’t toy problems or unfamiliar repositories. These were the kinds of tasks that make up the majority of actual software engineering work.
Participants were split into two groups. Half used their preferred AI coding tools (Copilot, Cursor, ChatGPT — whatever they normally use). Half worked without AI assistance. Tasks were timed objectively. The experiment controlled for task difficulty, developer experience level, and codebase familiarity.
The result: the AI-assisted group was consistently slower on maintenance tasks. Not marginally — 19% slower on average, with some tasks showing even larger gaps.
Importantly, this finding is specific to maintenance work on familiar codebases. For greenfield coding, boilerplate generation, or working in unfamiliar languages, AI tools do provide measurable speedups. But maintenance — fixing bugs, refactoring, extending existing features — is where most professional software engineering time is spent.
Why AI Slows Down Experienced Developers on Maintenance Tasks
Four mechanisms explain the slowdown:
1. Context Switching Overhead. Developers spend significant time formulating prompts, evaluating AI suggestions, and course-correcting when the AI misunderstands the codebase context. Each interaction with the AI tool is a context switch away from the mental model the developer has built of the code. For an experienced developer who already understands the codebase, these interruptions add friction rather than removing it.
2. Over-Reliance Replacing Systematic Debugging. Instead of reading the code, tracing execution paths, and systematically narrowing down the bug, developers paste error messages into AI and iterate on AI-suggested fixes. For experienced developers who have efficient debugging strategies, this “AI-first” approach is actually slower than their established methods. The AI doesn’t know the codebase’s history, design decisions, or invariants — the developer does.
3. Scope Creep from AI Suggestions. AI tools often suggest “improvements” beyond the scope of the current task. A developer fixing a bug gets an AI suggestion that also refactors the surrounding code, updates naming conventions, and adds error handling. These suggestions aren’t wrong — they might even be good ideas — but they pull developers into rabbit holes that extend task completion time without contributing to the original objective.
4. Verification Overhead. AI-generated changes require careful review because the AI doesn’t understand the codebase’s invariants, edge cases, and implicit contracts. An experienced developer reading and modifying code themselves has high confidence in their changes. When reviewing AI-generated changes, they need to verify every line against their mental model of the system, which can take longer than just writing the code themselves.
The Perception Gap: The Most Dangerous Finding
The 19% slowdown is concerning, but the perception gap is genuinely alarming. Developers feel faster with AI tools even when objective measurements show they’re slower. This isn’t delusion — there are real psychological reasons for it:
- AI tools reduce the feeling of being “stuck.” Even when iterating on wrong solutions, the developer feels productive because things are happening.
- The cognitive effort feels lower, which the brain interprets as efficiency.
- AI handles the tedious parts (boilerplate, syntax), creating a subjective experience of speed even when total task time increases.
CIO Magazine has called this “the AI productivity trap” — a situation where subjective productivity assessments diverge from objective measurements, leading organizations to make decisions based on how things feel rather than what the data shows.
Implications for Engineering Leadership
If developers feel faster but aren’t, teams will make systematically bad decisions:
- Tooling investment based on satisfaction surveys rather than output metrics
- Sprint velocity expectations inflated by perceived (not actual) productivity gains
- Process changes that optimize for developer feelings rather than delivery outcomes
- Hiring and capacity planning that assumes AI-augmented output levels that don’t materialize
You cannot rely on developer surveys to measure AI tool ROI. You need objective metrics: time to resolution, defect rates post-deployment, lines of code that survive to production versus churn, and actual sprint completion rates.
What I’m Doing About It
I’m running a 3-month internal study comparing teams with and without AI tools on matched projects. Matched on complexity, team experience, and codebase maturity. Measuring actual outcomes:
- Task completion time (objective, not self-reported)
- Defect escape rate (bugs that reach production)
- Code churn (percentage of AI-generated code that gets rewritten within 30 days)
- Sprint completion rate (stories completed vs. committed)
I’ll share results when we have them. But I’m already skeptical of the “10x developer” narrative that’s dominating conference stages.
Has anyone else measured AI tool impact with objective data rather than surveys? I’d love to compare methodologies and findings. The industry needs more rigorous measurement and less anecdote-driven hype.