Generation-Then-Comprehension Scores 65%+, AI Delegation Scores <40%—Your Team's AI Usage Pattern Determines Skill Formation, Not Just Productivity

Generation-Then-Comprehension Scores 65%+, AI Delegation Scores <40%—Your Team’s AI Usage Pattern Determines Skill Formation, Not Just Productivity

I’ve been thinking hard about how my team uses AI coding assistants, and I just came across research that changed my entire perspective on the problem.

The Problem: AI Makes Us Faster, But Are We Getting Dumber?

We’ve all seen the productivity promises. GitHub Copilot, Cursor, Claude Code—the tools are everywhere, and developers love them. My team’s velocity metrics look great. We’re shipping features faster than ever. PR counts are up. Code coverage is green.

But here’s what’s keeping me up at night: a new Anthropic study found that developers using AI assistance scored 17% lower on comprehension tests when learning new coding libraries, despite no significant productivity gains on average.

The study (How AI Impacts Skill Formation) was a randomized controlled trial with 52 software engineers learning Trio, an async programming library none of them had used before. Half used AI assistance, half didn’t. The results were striking:

  • AI users scored 50% on comprehension quizzes vs 67% for the control group
  • Largest declines in debugging ability, with smaller drops in conceptual understanding and code reading
  • Productivity gains were not statistically significant (they finished in roughly the same time)

This isn’t just about learning new libraries. It’s about what happens to our teams’ fundamental capabilities when AI becomes the default way we write code.

The Critical Distinction: How You Use AI Matters More Than If You Use It

Here’s where it gets interesting. Not everyone who used AI scored poorly. The study identified distinct usage patterns with vastly different outcomes:

High-Scoring Patterns (65%+ on comprehension):

  1. Generation-Then-Comprehension: Generate code first, then ask follow-up questions to improve understanding. Not particularly fast, but strong comprehension.

  2. Hybrid Code-Explanation: Ask for code generation along with explanations of the generated code in the same query.

  3. Conceptual Inquiry: Only ask conceptual questions, rely on improved understanding to complete the task. Encountered many errors but resolved them independently. Fastest among high-scoring patterns.

Low-Scoring Patterns (<40% on comprehension):

  1. AI Delegation: Wholly relied on AI to write code and complete the task. Completed fastest with few errors, but scored poorly on the quiz.

  2. Progressive AI Reliance: Started with questions but eventually delegated all code writing to AI. Less independent thinking, more cognitive offloading.

  3. Iterative AI Debugging: Relied on AI to debug or verify code. Asked questions but relied on the assistant to solve problems rather than clarifying their own understanding.

The gap is enormous. Using AI for conceptual inquiry vs delegation creates a 25+ percentage point difference in skill retention.

The Hidden Cost: Comprehension Debt

Addy Osmani coined the term “comprehension debt”—the growing gap between how much code exists in your system and how much any human genuinely understands.

This is different from technical debt. Technical debt announces itself through slow builds, tangled dependencies, the creeping dread when you touch that one module. Comprehension debt breeds false confidence. Everything looks fine:

  • :white_check_mark: Velocity metrics look immaculate
  • :white_check_mark: DORA metrics hold steady
  • :white_check_mark: PR counts are up
  • :white_check_mark: Code coverage is green

But none of these capture comprehension deficits. You don’t see the problem until:

  • A critical bug appears and no one understands the codebase well enough to fix it quickly
  • You need to make an architectural change and realize nobody grasps the system design
  • Your most productive developers leave and you discover they were the only ones who understood key systems
  • The AI-generated code requires 5-7x longer to understand than to generate (Cognitive Debt study)

The Productivity Paradox

Here’s the cruel twist: 67% of developers spend more time debugging AI-generated code despite initial velocity gains. Additional data from 2026 research:

  • 68% spend more time resolving security vulnerabilities in AI code
  • 59% report more deployment problems
  • The speed advantage evaporates in the review, debug, and fix cycles

We’re optimizing for the wrong metric. Fast code generation doesn’t matter if we’ve created a codebase no human can maintain.

The Real Question: How Do We Train Teams in High-Scoring Patterns?

This is where I need the community’s help. The research is clear: AI usage patterns determine skill formation. But how do we operationalize this on actual teams?

Here’s what I’m struggling with:

  1. How do you enforce “generation-then-comprehension” workflows? Do you require engineers to document their understanding? Add comprehension checks to PR reviews?

  2. How do you prevent the slide into AI delegation? It’s the fastest pattern. Developers will naturally drift toward it under deadline pressure.

  3. How do you measure comprehension debt? We have metrics for code quality, test coverage, deployment frequency. What’s the metric for “does anyone actually understand this?”

  4. Is this even realistic for junior developers? If they’ve never done manual coding, how do they develop the baseline to know when AI is wrong?

  5. What about the 41% of new code that’s already AI-generated? Are we past the point of no return?

I’m considering a few approaches:

  • Mandatory “explain-back” sessions where engineers must explain AI-generated code to the team
  • “AI-free Fridays” to maintain manual coding skills
  • Comprehension tests as part of performance reviews
  • Pair programming requirements for AI-generated code

But I don’t know if these are the right answers. What are you seeing on your teams? How are you balancing AI productivity with skill formation?

Because if we’re not careful, we’re going to build a generation of engineers who can prompt AI but can’t understand code.


Sources:

This hits hard. I’ve been seeing exactly this pattern on my team, and I didn’t have language for it until now.

We’re Measuring Velocity When We Should Measure Durability

Your point about comprehension debt being invisible in our standard metrics is dead-on. Last quarter, our deployment frequency was up 23%, and I presented it as a win. But here’s what I didn’t show leadership:

  • MTTR increased 15% despite deploying more frequently
  • P1 incidents requiring senior engineer escalation up 31%
  • Time-to-onboard new engineers went from 4 weeks to 7 weeks

We got faster at shipping code. We got slower at understanding, debugging, and teaching the system. The AI productivity gains evaporated when measured against “can anyone who didn’t write this code fix it?”

The Junior Engineer Problem Is Real

Your question about juniors hits a nerve. I have two developers who joined in the past 6 months—talented folks, both use Copilot extensively. They ship features quickly. But when I asked them to debug a production issue last week, they struggled because they’d never actually learned to read unfamiliar code without AI assistance.

One literally said: “Can I just ask Claude to explain what this function does?”

That’s when I realized: we’ve trained them to be AI prompters, not engineers. They can describe what they want in English, but they can’t trace execution flow or reason about edge cases without an AI copilot.

What I’m Trying (With Mixed Results)

I’ve implemented a few things, though I’m not confident any of them are the right answer:

1. PR Template: “Explain Your Understanding”

Added a section to our PR template:

  • “What does this code do?” (in your own words, not AI-generated)
  • “What edge cases did you consider?”
  • “If this breaks in production, where would you start debugging?”

Result: Initially, people just asked AI to write the explanations. After I called this out in reviews, they started actually thinking through it. But it adds 10-15 minutes per PR, and I’m getting pushback.

2. Weekly “Code Reading” Sessions

Every Friday, 30 minutes. We pick a complex module someone didn’t write, turn off AI tools, and collectively read through it. No writing code—just understanding existing code.

Result: Engineers hate it. “This feels like a waste of time when we could be shipping features.” But the engineers who participate debug faster and ask better questions.

3. “AI Usage Patterns” in Performance Reviews

I track—loosely—whether engineers are using generation-then-comprehension vs delegation patterns. I look at:

  • Do they modify AI-generated code or accept it wholesale?
  • Do they ask follow-up questions in PRs or just ship?
  • When debugging, do they understand the problem or just iterate with AI until it works?

Result: This is the most controversial. Engineers feel surveilled. I’ve had to be very careful about framing this as “skill development” not “performance punishment.”

The Deadline Pressure Problem

Your point about delegation being the fastest pattern is the killer. Under deadline pressure, high-scoring patterns lose. When the VP Product is asking “why is this feature taking so long?”, no engineer is going to say “because I’m asking AI follow-up questions to improve my understanding instead of just accepting the generated code.”

The economic incentive is backwards. Skill formation is a long-term investment with short-term productivity costs. But we’re optimized for sprint velocity.

My Biggest Fear

We’re creating a generation of engineers who are incredibly productive until the AI model has an outage, the company switches tools, or the problem domain isn’t well-represented in training data.

Then we discover we don’t actually have engineers. We have sophisticated AI prompt crafters with no fallback skills.

What happens in 5 years when these folks are supposed to be senior engineers, but they never learned to read code, debug without AI, or reason about system architecture?

I don’t have answers. But I’m glad someone finally named the problem.

This research terrifies me, and I’ll tell you why: we’re solving for the wrong organizational problem.

We’re Optimizing Individual Velocity When We Need Team Resilience

The Anthropic study measures individual comprehension. But here’s what I’m seeing at the organizational level:

AI doesn’t just reduce individual skill formation—it creates knowledge silos that make entire teams fragile.

Last month, one of my senior engineers went on parental leave. They’d been using AI heavily for a microservice rewrite—deployed fast, great velocity metrics, everyone happy. Then they left for 3 months, and the team discovered:

  • No one else understood the service architecture (it was AI-generated based on prompts only that engineer saw)
  • The documentation was AI-generated (looked comprehensive, but didn’t actually explain why decisions were made)
  • The test coverage was AI-generated (high percentage, but didn’t cover the actual edge cases we encounter in production)

We had a P0 incident 2 weeks into their leave. It took 4 engineers and 8 hours to understand a system one person built in 3 weeks. The comprehension debt came due all at once.

The Invisible Skill Tax

What’s keeping me up at night is this: AI makes skill gaps invisible until they’re catastrophic.

With manual coding, you see skill gaps early:

  • Junior engineer struggles with a task → mentor steps in → skill transfer happens
  • Code review reveals misunderstanding → discussion happens → team learns
  • Bug in production → engineer debugs → understanding deepens

With AI coding, skill gaps are masked:

  • Junior engineer uses AI delegation → task completes quickly → no one notices they didn’t learn
  • Code review shows working code → gets approved → no deep discussion happens
  • Bug in production → engineer uses AI to debug → works, but still no understanding

We’ve replaced the natural feedback loops that create expertise with AI shortcuts that create dependency.

What Scares Me: The Promotion Pipeline Problem

Here’s the organizational catastrophe I see coming:

  1. Junior engineers use AI delegation → ship fast, look productive, hit metrics
  2. We promote them to senior roles based on velocity and output
  3. They become technical leads and architects without ever developing deep understanding
  4. They make architectural decisions using AI without the expertise to know when AI is wrong
  5. The entire technical foundation becomes fragile because no one actually understands the system

We’re promoting people based on AI-assisted output, not actual expertise. In 3-5 years, we’ll have “senior engineers” who’ve never learned to debug, “architects” who’ve never designed a system without AI, and “tech leads” who can’t mentor because they never built the skills themselves.

The Equity Dimension No One’s Talking About

There’s also an equity issue here that I haven’t seen discussed:

Access to AI coding tools is creating a new digital divide in skill formation.

Engineers at well-funded companies have Claude Code, Cursor, Copilot, and all the latest tools. They’re getting AI-assisted productivity gains. But if those tools are creating comprehension debt, they’re also:

  • Reducing skill formation for engineers who have access
  • Creating skill gaps between those who rely on AI and those who don’t (or can’t afford to)
  • Making it harder to evaluate actual engineering capability in interviews and performance reviews

How do you hire for senior roles when half the candidates used AI for everything and the other half didn’t? Are we selecting for prompt engineering skills or actual engineering expertise?

What I’m Trying: Team-Level Skill Metrics

I’m experimenting with measuring comprehension at the team level, not just individual level:

1. “Bus Factor” Tracking

For critical systems, I track: “If [person] is unavailable, how long would it take the team to fix a P0 issue?”

We run quarterly tabletop exercises where I randomly assign “on-call” for different systems and see if people can actually debug them. If only one person can fix it quickly, that’s a comprehension debt red flag.

2. “Explain to a New Hire” Tests

Every month, I pick a random engineer and ask them to explain a system they didn’t build to a hypothetical new hire. Can they do it without referencing the code? Do they understand the why, not just the what?

Engineers who used generation-then-comprehension can explain. Engineers who used delegation can’t.

3. Rotation Through “Legacy” Code

I’m requiring all engineers, regardless of seniority, to spend 20% of their time on “legacy” systems (code written before AI tools were common). The forcing function: you can’t use AI to understand these systems because they’re domain-specific and pre-date most training data.

This is deeply unpopular. People hate it. But it’s the only way I’ve found to force actual skill building.

The Real Question: Are We Building for 2026 or 2031?

Michelle, your question about whether we’re past the point of no return hits at something deeper:

Are we optimizing for the next sprint or the next 5 years?

If AI coding tools plateau—if Claude, Copilot, and Cursor don’t get significantly better—then the engineers using delegation patterns are in trouble. They’ll be productive for 2-3 years, then hit a ceiling because they never built foundational skills.

But if AI tools get dramatically better—if they can actually architect systems, debug production issues, and maintain legacy code—then maybe delegation is the right strategy, and we’re just in an awkward transition period.

I don’t know which future we’re in. But I know this: teams with deep understanding are resilient. Teams dependent on AI are fragile.

And in a world where every company is one model outage away from grinding to a halt, fragility is an existential risk.

Okay, so I’m not a “real” engineer (I write CSS and dabble in React), but this whole conversation is giving me déjà vu from what happened in design a few years ago.

Design Went Through This Exact Pattern

Remember when design tools got super automated? Figma AI, Midjourney, generative UI? The design community had this same panic:

  • “Junior designers are just prompting AI instead of learning fundamentals”
  • “No one understands design systems anymore, they just generate components”
  • “The AI makes everything look good superficially, but it’s not actually solving the user problem”

Sound familiar?

Here’s What We Learned (The Hard Way)

I watched a junior designer get hired at my company last year. Their portfolio was stunning—beautiful AI-generated mockups, slick prototypes, impressive variety. They got the job.

Two weeks in, they couldn’t:

  • Explain why they chose a specific color palette (AI generated it)
  • Modify a design system component (they’d only ever used AI to generate new ones)
  • Critique their own work (they didn’t understand what made a design “good” beyond “looks nice”)

They could prompt AI to make pretty things. They couldn’t actually design.

We ended up pairing them with a senior designer for 6 months just to teach fundamentals. It was basically an extended bootcamp for someone we thought was mid-level based on their portfolio.

The “Generation-Then-Comprehension” Thing is KEY

The pattern Michelle described—generate first, then ask questions to understand—is exactly what the good designers do with AI.

Bad designers:

  • Prompt → accept → ship
  • “Make it look like Airbnb”
  • “Create a dashboard for analytics”
  • No understanding of why the AI made those choices

Good designers:

  • Prompt → question → iterate → understand
  • “Why did you choose this layout?”
  • “What accessibility considerations did you make?”
  • “How does this scale to mobile?”

The difference is active learning vs passive consumption.

My Controversial Take: Maybe This is Just the New Normal?

Here’s where I might lose people:

Maybe we’re not supposed to “understand” AI-generated code the same way we understood hand-written code.

Hear me out.

When cars became common, we stopped teaching everyone how to fix engines. Most people can’t diagnose a transmission problem or rebuild a carburetor. Does that make them “dependent” on mechanics? Or did we just specialize and abstract?

When compilers became sophisticated, we stopped worrying about assembly language and register allocation. Most developers can’t hand-optimize machine code. Is that comprehension debt, or progress?

Maybe the real skill is knowing when to trust AI and when to dig deeper.

The Skills That Actually Matter

In design, the skills that survived the AI transition weren’t “can you draw a perfect icon by hand” or “can you manually kern typography.” The skills that matter are:

  1. Taste and judgment (is this solving the right problem?)
  2. Critical evaluation (is the AI output actually good, or just visually appealing?)
  3. Systems thinking (how does this fit into the larger product?)
  4. User empathy (does this meet real user needs?)

For engineers, maybe it’s the same:

  1. Architectural judgment (is this the right system design?)
  2. Code review skills (is AI-generated code actually good?)
  3. Debugging ability (can you figure out why something broke?)
  4. Product sense (are we building the right thing?)

None of those require writing every line of code by hand.

Where I Think Michelle Is Right (and It’s Scary)

The delegation pattern is dangerous because it removes all the learning opportunities.

If you:

  • Let AI write the code
  • Don’t ask follow-up questions
  • Don’t review critically
  • Don’t try to understand

Then you’re not building any skills—not coding skills, not judgment, not systems thinking, nothing.

You’re outsourcing your entire intellectual development to a tool.

And that’s the nightmare scenario: engineers who can describe what they want, but can’t evaluate what they get, can’t modify it, can’t debug it, can’t extend it.

My Wildly Impractical Suggestion

What if we stopped measuring productivity and started measuring learning velocity?

Instead of “how many features did you ship?”, what if we asked:

  • “How much did you learn this sprint?”
  • “What new capability did you develop?”
  • “What system do you understand better than you did last month?”

If you’re using AI and not learning, you’re doing it wrong.

The engineers who are using generation-then-comprehension—they’re learning faster than manual coders because they can explore more, experiment more, ask more questions. They’re using AI as a learning accelerator, not a skill replacement.

The Really Uncomfortable Question

What if the engineers who score high on comprehension tests without AI are actually learning slower than engineers who use AI for generation-then-comprehension?

The Anthropic study showed AI users scored 17% lower. But it was a short-term test on a single library. What about 6 months later? What about different domains?

What if the right AI usage pattern actually produces better engineers long-term, but looks worse on quarterly metrics?

I don’t know. But I think we should be asking the question.


(Also, Michelle, your “AI-free Fridays” idea is brilliant. In design, we do “manual Mondays” where no AI tools allowed—just Figma, sketching, and fundamentals. It’s wildly unpopular but the people who do it are noticeably better designers.)

Coming at this from the product side, and I think we’re all missing the forest for the trees.

The Business Model Has Already Decided This

Here’s the uncomfortable truth: the companies winning in 2026 are the ones shipping fastest, not the ones with the “best” engineers.

Let me show you the math I’m seeing:

Company A (high comprehension, low AI usage):

  • 3 features shipped per quarter
  • Strong technical foundation
  • Engineers deeply understand the codebase
  • Low technical debt
  • Great long-term sustainability

Company B (low comprehension, high AI delegation):

  • 12 features shipped per quarter
  • Fragile technical foundation
  • Engineers don’t understand the codebase
  • High technical debt
  • Questionable long-term sustainability

Who raises the Series B? Company B. Every single time.

Investors don’t ask “do your engineers understand the code?” They ask “what’s your monthly growth rate?” and “how fast can you ship the roadmap?”

We’re Optimizing for the Wrong Time Horizon

Michelle, Keisha, Luis—you’re all talking about 3-5 year risks. Startups don’t survive on 3-5 year time horizons. They survive on 3-6 month fundraising cycles.

The market has already made the choice:

  • Short-term velocity wins
  • Comprehension debt is a future problem
  • By the time the debt comes due, you’re either dead or big enough to hire your way out of it

This is the VC-backed startup playbook. It’s always been “move fast and break things.” AI just makes it faster and more broken.

The Talent Market Is Telling Us Something

Here’s another data point: the engineers who are most productive with AI are getting hired faster and paid more.

I just went through a hiring cycle. We interviewed 40 candidates. The ones who could:

  • Demonstrate shipping velocity with AI tools
  • Show polished portfolios of AI-assisted projects
  • Talk fluently about prompt engineering and AI workflows

They got offers. The ones who wanted to “deeply understand every line of code” and “avoid AI crutches”? They didn’t.

The market is rewarding AI delegation, not comprehension.

The Product-Market Fit Question

Here’s the question I ask when evaluating technical decisions: does this get us closer to product-market fit, or further away?

For pre-PMF companies (most startups), the answer is:

  • Velocity matters more than quality
  • Shipping experiments matters more than understanding the codebase
  • Learning what customers want matters more than learning how the code works

The Anthropic study shows engineers who use AI score 17% lower on comprehension. But what if that 17% comprehension loss buys you 50% more experiments, 50% more customer conversations, 50% faster PMF discovery?

Would you trade it?

I would. Every time.

Where I Think You’re Right (and It’s a Problem)

The place I worry is post-PMF companies trying to scale.

Once you have PMF and you’re scaling, the dynamics flip:

  • Reliability matters more than experimentation
  • Understanding the system matters more than shipping new features
  • Long-term sustainability matters more than short-term velocity

But here’s the trap: the engineers who got you to PMF with AI delegation can’t scale the company.

They’re optimized for speed, not understanding. They can’t:

  • Architect systems for scale (they’ve never built understanding of architecture)
  • Debug complex production issues (they’ve never learned to read unfamiliar code)
  • Mentor junior engineers (they’ve never built skills to teach)

You need a different team post-PMF. Or you need to retrain your existing team.

Most companies choose “hire a different team.” Hence the “hire senior engineers to clean up the mess” playbook.

The Organizational Lifecycle Problem

I think the real answer is: different AI usage patterns for different stages.

Pre-PMF (0-2 years):

  • AI delegation is fine, maybe even optimal
  • Optimize for learning about customers, not learning about code
  • Comprehension debt doesn’t matter if you’re pivoting every 6 weeks
  • Hire for velocity and customer obsession

Scaling (2-5 years):

  • Shift to generation-then-comprehension
  • Optimize for understanding as you build the long-term foundation
  • Comprehension debt starts to hurt
  • Hire for depth and systems thinking

Mature (5+ years):

  • Require deep understanding, use AI as an accelerator
  • Optimize for maintainability and long-term sustainability
  • Comprehension debt is catastrophic
  • Hire for expertise and architectural judgment

The problem is most companies try to use the same playbook across all stages.

The Uncomfortable Suggestion

What if we stopped pretending there’s “one right way” to use AI, and instead:

  1. Be honest about what stage your company is in
  2. Optimize AI usage for that stage
  3. Hire people who match that optimization
  4. Accept that you’ll need different people (or retrain) as you scale

The engineers using delegation to ship fast pre-PMF? They’re doing the right thing for that context.

The engineers insisting on deep understanding at a mature company? They’re doing the right thing for that context.

The mismatch is when you’re at one stage but using the other playbook.

The Question I’m Asking

Here’s what I want to know from the engineering leaders:

Can you actually retrain AI delegation engineers into generation-then-comprehension engineers?

Or is this like trying to turn sprinters into marathon runners—fundamentally different muscle memory, fundamentally different mindset?

Because if you can’t retrain, then every company has a hard transition point where they need to swap out the team. And that’s brutal for everyone involved—the engineers who got you to PMF but can’t scale, and the company that has to let them go.

I don’t have an answer. But I think we need to be way more honest about the trade-offs we’re making.