76% of Developers Generate Code They Don’t Fully Understand—Anthropic Study Shows 17% Lower Skill Mastery With AI Assistance. Are We Trading Velocity for Understanding?
I just read Addy Osmani’s latest piece on “comprehension debt” and it’s been weighing on me all week. The core finding hit hard: developers using AI coding assistants scored 17% lower on comprehension tests when learning new libraries, according to a recent Anthropic study with 52 software engineers.
The Velocity-Understanding Trade-Off We Don’t Talk About
Here’s what makes this particularly uncomfortable for those of us managing engineering teams: the study showed no statistically significant productivity gains on average, but clear comprehension losses. We’re not even winning on velocity—we’re just generating code we understand less.
The biggest drops were in debugging ability, followed by code reading and conceptual understanding. Think about what that means: the very skills you need to validate AI-generated code are the ones degrading fastest.
The 5-7x Gap Between Generation and Absorption
Osmani introduces a metric that crystallizes the problem: AI generates code 5-7x faster than developers can absorb it. Pull request volume is climbing while review capacity stays flat. The organizational assumption that “reviewed code is understood code” no longer holds.
What’s emerging in my team’s code reviews is concerning: engineers approving code they don’t fully understand, which now carries implicit endorsement. The system contains more code than any human on the team genuinely understands.
It’s Not the Tool—It’s How We Use It
The Anthropic study revealed something critical about usage patterns:
Low performers (scored below 40%):
- Delegated complete code generation to AI
- Progressively relied on AI for all work
- Used AI iteratively to debug rather than understand
High performers (scored 65%+):
- Asked follow-up questions after generating code
- Combined code generation with explanations
- Used AI only for conceptual questions while coding independently
The difference isn’t AI vs no-AI. It’s cognitive engagement vs delegation.
The Questions I’m Wrestling With
As someone leading a 40+ person engineering team through AI adoption:
-
How do we measure comprehension debt? Technical debt announces itself through mounting friction. Comprehension debt breeds false confidence until production breaks.
-
What does onboarding look like when senior engineers don’t fully understand the codebase? The engineer who truly understands the system becomes more valuable, not less—but what happens when that person doesn’t exist?
-
Should we design deliberate friction into AI tooling? Anthropic recommends “intentional design choices that support learning.” What does that look like practically?
-
How do we avoid creating a two-tier system where some engineers build comprehension while others just ship code? That’s not just a skill gap—it’s a career trajectory gap.
The Uncomfortable Reality
I pushed AI coding assistants heavily in Q1. Copilot for everyone. Cursor licenses. “Move fast” was the mantra. Now I’m looking at our incident rate—up 15%—and our mean time to resolution—up 22%—and wondering if we optimized for the wrong metric.
The research suggests we’re not alone. 67% of developers spend more time debugging AI-generated code than they saved generating it.
Are we trading long-term engineering capability for short-term output theater?
I don’t have answers yet. But I’m increasingly convinced that how we adopt AI coding tools in the next 12 months will determine whether we build engineering teams that compound in capability or erode in understanding.
What are you seeing in your teams? Are you measuring comprehension alongside velocity? And if so, how?
Sources: