Generation-Then-Comprehension Scores 65%+, AI Delegation Scores <40%—Your Team’s AI Usage Pattern Determines Skill Formation, Not Just Productivity
I’ve been thinking hard about how my team uses AI coding assistants, and I just came across research that changed my entire perspective on the problem.
The Problem: AI Makes Us Faster, But Are We Getting Dumber?
We’ve all seen the productivity promises. GitHub Copilot, Cursor, Claude Code—the tools are everywhere, and developers love them. My team’s velocity metrics look great. We’re shipping features faster than ever. PR counts are up. Code coverage is green.
But here’s what’s keeping me up at night: a new Anthropic study found that developers using AI assistance scored 17% lower on comprehension tests when learning new coding libraries, despite no significant productivity gains on average.
The study (How AI Impacts Skill Formation) was a randomized controlled trial with 52 software engineers learning Trio, an async programming library none of them had used before. Half used AI assistance, half didn’t. The results were striking:
- AI users scored 50% on comprehension quizzes vs 67% for the control group
- Largest declines in debugging ability, with smaller drops in conceptual understanding and code reading
- Productivity gains were not statistically significant (they finished in roughly the same time)
This isn’t just about learning new libraries. It’s about what happens to our teams’ fundamental capabilities when AI becomes the default way we write code.
The Critical Distinction: How You Use AI Matters More Than If You Use It
Here’s where it gets interesting. Not everyone who used AI scored poorly. The study identified distinct usage patterns with vastly different outcomes:
High-Scoring Patterns (65%+ on comprehension):
-
Generation-Then-Comprehension: Generate code first, then ask follow-up questions to improve understanding. Not particularly fast, but strong comprehension.
-
Hybrid Code-Explanation: Ask for code generation along with explanations of the generated code in the same query.
-
Conceptual Inquiry: Only ask conceptual questions, rely on improved understanding to complete the task. Encountered many errors but resolved them independently. Fastest among high-scoring patterns.
Low-Scoring Patterns (<40% on comprehension):
-
AI Delegation: Wholly relied on AI to write code and complete the task. Completed fastest with few errors, but scored poorly on the quiz.
-
Progressive AI Reliance: Started with questions but eventually delegated all code writing to AI. Less independent thinking, more cognitive offloading.
-
Iterative AI Debugging: Relied on AI to debug or verify code. Asked questions but relied on the assistant to solve problems rather than clarifying their own understanding.
The gap is enormous. Using AI for conceptual inquiry vs delegation creates a 25+ percentage point difference in skill retention.
The Hidden Cost: Comprehension Debt
Addy Osmani coined the term “comprehension debt”—the growing gap between how much code exists in your system and how much any human genuinely understands.
This is different from technical debt. Technical debt announces itself through slow builds, tangled dependencies, the creeping dread when you touch that one module. Comprehension debt breeds false confidence. Everything looks fine:
Velocity metrics look immaculate
DORA metrics hold steady
PR counts are up
Code coverage is green
But none of these capture comprehension deficits. You don’t see the problem until:
- A critical bug appears and no one understands the codebase well enough to fix it quickly
- You need to make an architectural change and realize nobody grasps the system design
- Your most productive developers leave and you discover they were the only ones who understood key systems
- The AI-generated code requires 5-7x longer to understand than to generate (Cognitive Debt study)
The Productivity Paradox
Here’s the cruel twist: 67% of developers spend more time debugging AI-generated code despite initial velocity gains. Additional data from 2026 research:
- 68% spend more time resolving security vulnerabilities in AI code
- 59% report more deployment problems
- The speed advantage evaporates in the review, debug, and fix cycles
We’re optimizing for the wrong metric. Fast code generation doesn’t matter if we’ve created a codebase no human can maintain.
The Real Question: How Do We Train Teams in High-Scoring Patterns?
This is where I need the community’s help. The research is clear: AI usage patterns determine skill formation. But how do we operationalize this on actual teams?
Here’s what I’m struggling with:
-
How do you enforce “generation-then-comprehension” workflows? Do you require engineers to document their understanding? Add comprehension checks to PR reviews?
-
How do you prevent the slide into AI delegation? It’s the fastest pattern. Developers will naturally drift toward it under deadline pressure.
-
How do you measure comprehension debt? We have metrics for code quality, test coverage, deployment frequency. What’s the metric for “does anyone actually understand this?”
-
Is this even realistic for junior developers? If they’ve never done manual coding, how do they develop the baseline to know when AI is wrong?
-
What about the 41% of new code that’s already AI-generated? Are we past the point of no return?
I’m considering a few approaches:
- Mandatory “explain-back” sessions where engineers must explain AI-generated code to the team
- “AI-free Fridays” to maintain manual coding skills
- Comprehension tests as part of performance reviews
- Pair programming requirements for AI-generated code
But I don’t know if these are the right answers. What are you seeing on your teams? How are you balancing AI productivity with skill formation?
Because if we’re not careful, we’re going to build a generation of engineers who can prompt AI but can’t understand code.
Sources: