76% of Developers Generate Code They Don't Fully Understand—Anthropic Study Shows 17% Lower Skill Mastery With AI Assistance. Are We Trading Velocity for Understanding?

76% of Developers Generate Code They Don’t Fully Understand—Anthropic Study Shows 17% Lower Skill Mastery With AI Assistance. Are We Trading Velocity for Understanding?

I just read Addy Osmani’s latest piece on “comprehension debt” and it’s been weighing on me all week. The core finding hit hard: developers using AI coding assistants scored 17% lower on comprehension tests when learning new libraries, according to a recent Anthropic study with 52 software engineers.

The Velocity-Understanding Trade-Off We Don’t Talk About

Here’s what makes this particularly uncomfortable for those of us managing engineering teams: the study showed no statistically significant productivity gains on average, but clear comprehension losses. We’re not even winning on velocity—we’re just generating code we understand less.

The biggest drops were in debugging ability, followed by code reading and conceptual understanding. Think about what that means: the very skills you need to validate AI-generated code are the ones degrading fastest.

The 5-7x Gap Between Generation and Absorption

Osmani introduces a metric that crystallizes the problem: AI generates code 5-7x faster than developers can absorb it. Pull request volume is climbing while review capacity stays flat. The organizational assumption that “reviewed code is understood code” no longer holds.

What’s emerging in my team’s code reviews is concerning: engineers approving code they don’t fully understand, which now carries implicit endorsement. The system contains more code than any human on the team genuinely understands.

It’s Not the Tool—It’s How We Use It

The Anthropic study revealed something critical about usage patterns:

Low performers (scored below 40%):

  • Delegated complete code generation to AI
  • Progressively relied on AI for all work
  • Used AI iteratively to debug rather than understand

High performers (scored 65%+):

  • Asked follow-up questions after generating code
  • Combined code generation with explanations
  • Used AI only for conceptual questions while coding independently

The difference isn’t AI vs no-AI. It’s cognitive engagement vs delegation.

The Questions I’m Wrestling With

As someone leading a 40+ person engineering team through AI adoption:

  1. How do we measure comprehension debt? Technical debt announces itself through mounting friction. Comprehension debt breeds false confidence until production breaks.

  2. What does onboarding look like when senior engineers don’t fully understand the codebase? The engineer who truly understands the system becomes more valuable, not less—but what happens when that person doesn’t exist?

  3. Should we design deliberate friction into AI tooling? Anthropic recommends “intentional design choices that support learning.” What does that look like practically?

  4. How do we avoid creating a two-tier system where some engineers build comprehension while others just ship code? That’s not just a skill gap—it’s a career trajectory gap.

The Uncomfortable Reality

I pushed AI coding assistants heavily in Q1. Copilot for everyone. Cursor licenses. “Move fast” was the mantra. Now I’m looking at our incident rate—up 15%—and our mean time to resolution—up 22%—and wondering if we optimized for the wrong metric.

The research suggests we’re not alone. 67% of developers spend more time debugging AI-generated code than they saved generating it.

Are we trading long-term engineering capability for short-term output theater?

I don’t have answers yet. But I’m increasingly convinced that how we adopt AI coding tools in the next 12 months will determine whether we build engineering teams that compound in capability or erode in understanding.

What are you seeing in your teams? Are you measuring comprehension alongside velocity? And if so, how?


Sources:

This hits different from a design systems perspective. We’re seeing the exact same pattern with AI-generated design tokens and component code.

Last month I caught a junior designer shipping a Figma-to-code AI output that “worked” but violated our entire color contrast system. When I asked why, they said “the AI generated it and the tests passed.”

The tests passed because the tests only checked for valid CSS syntax, not accessibility compliance or design system adherence.

The Design-Code Comprehension Gap

What scares me is the growing gap between what the code does and why it exists. Our design system has 8 years of hard-learned decisions baked into it:

  • Why we use rem instead of px for typography
  • Why certain color combinations are forbidden (they fail WCAG AA on some screens)
  • Why we have max-width constraints on certain components

AI tools generate syntactically valid alternatives that violate all of this institutional knowledge. And when juniors accept those suggestions without understanding the constraints, we’re essentially compiling our design system knowledge into oblivion.

“Ship Fast” Culture Meets “Understand Nothing” Tooling

Your 15% incident rate increase resonates. We had a 3-day outage last quarter because someone used AI to “optimize” our CSS bundle splitting—it worked in dev, passed CI, broke in production for users on older browsers.

The engineer who shipped it couldn’t debug it because they didn’t understand:

  • How the bundler worked
  • Why we had the previous splitting strategy
  • What the code was actually doing

We had to roll back and bring in the engineer who wrote the original system (who left 2 years ago) as a consultant. $15K to fix a “free” AI optimization.

The “Just Ship It” Generation

I mentor bootcamp grads and the pattern is clear: they’re learning to prompt before they learn to think. One student literally asked me “what’s the prompt to make buttons accessible?” instead of “how does accessibility work?”

That’s not a tool problem—it’s a learning philosophy problem. They’re being taught that understanding is optional if the output works.

My Controversial Take

Maybe we need to ban AI coding tools for the first 6 months in junior roles. Not forever—just long enough to build the mental models that let you validate AI outputs.

You can’t review AI-generated code if you don’t understand what good code looks like. And you can’t learn what good code looks like by only reading AI-generated code.

It’s like learning to cook by only eating at restaurants and reading recipes. You need to burn a few dishes first.

What would your team say if you proposed “AI-free onboarding for 90 days”? :thinking:

The 17% comprehension drop is alarming, but I want to challenge the framing slightly: we’re measuring the wrong baseline.

The Anthropic study compared AI-assisted developers to developers learning without AI. But in 2026, that’s not the real choice. The real choice is between:

  1. Developers who use AI with cognitive engagement
  2. Developers who delegate comprehension to AI
  3. Developers who don’t use AI at all (and fall behind on velocity)

High Performers Used AI Differently, Not Less

The study showed high scorers (65%+) used AI for conceptual inquiry—asking questions, requesting explanations, combining generation with understanding. They treated AI as a teaching assistant, not a code monkey.

Low scorers (below 40%) used complete delegation—“write this function”—and iterative debugging without understanding. They treated AI as a replacement for thinking.

That’s a training problem, not a tool problem.

We’re Solving This Wrong in Most Orgs

The response I’m seeing across the industry:

  • :cross_mark: Ban AI tools (doesn’t scale, creates shadow usage)
  • :cross_mark: Unlimited AI access with no guidance (Maya’s horror stories)
  • :cross_mark: Measure velocity only (Luis’s incident rate spike)

What actually works (we’re 6 months into this experiment):

  • :white_check_mark: Structured AI pairing sessions where seniors model high-engagement patterns
  • :white_check_mark: Code review rubric that checks for comprehension, not just correctness (“explain this PR in your own words”)
  • :white_check_mark: Incident post-mortems that track AI-generated code and comprehension gaps
  • :white_check_mark: Promotion criteria that explicitly value debugging and system understanding

The Data From Our Experiment

We split our engineering org into three cohorts:

  • Cohort A (AI-assisted with training): 12% productivity increase, 3% incident rate increase
  • Cohort B (AI-assisted without training): 8% productivity increase, 18% incident rate increase
  • Cohort C (no AI): Baseline productivity, baseline incident rate

Cohort A’s results came from teaching high-engagement AI usage patterns, not from restricting access. We taught them to:

  1. Generate code but require self-explanation before committing
  2. Use AI to ask “why” questions, not just “how” questions
  3. Validate AI outputs against first principles, not just test suites

Maya’s 90-Day AI-Free Onboarding

I love this idea but would modify it: 90 days of AI-assisted learning with mandatory comprehension checks.

Example: Junior uses Copilot to generate a React hook. Before they can commit:

  1. Explain what the hook does in plain English
  2. Identify what happens if you remove each line
  3. Describe an alternative implementation approach

This builds the validation muscle while maintaining velocity. You’re not banning the tool—you’re teaching critical engagement.

The Real Question Luis Raised

“How do we avoid creating a two-tier system where some engineers build comprehension while others just ship code?”

This is the existential question. If we don’t intentionally design for comprehension, market forces will create exactly this divide:

  • Tier 1: Engineers who understand systems, command premium comp, do architecture/debugging
  • Tier 2: Engineers who ship AI-generated code, commodity labor, first to be automated

We’re seeing this emerge in our hiring: candidates with strong debugging skills and system comprehension command 30-40% higher offers than candidates with equivalent shipping velocity but weaker fundamentals.

The market is already pricing in the comprehension premium.

The question is whether we intentionally cultivate that capability in our teams, or let it emerge as a stratification vector.

Coming at this from the product side, and the comprehension debt problem isn’t limited to code—we’re seeing it in product decisions too.

The Product Manager Comprehension Gap

Last quarter, a PM on my team used ChatGPT to generate a competitive analysis. The output was detailed, well-formatted, and completely wrong about two competitors’ pricing models.

When I asked how they validated it, they said “I spot-checked a few things and they seemed right.”

That’s the product equivalent of Luis’s engineers approving code they don’t understand. We’re shipping strategic decisions we didn’t actually think through.

Why This Matters for Engineering-Product Alignment

The comprehension debt on the engineering side is compounding with strategic comprehension debt on the product side:

  • PM generates user research summary with AI → doesn’t deeply understand user pain
  • PM writes PRD using AI suggestions → doesn’t fully own the “why”
  • Engineer generates code from PRD → doesn’t understand the user context
  • Code ships, user problem isn’t actually solved

We’ve created a comprehension gap across the entire product development cycle.

Nobody fully understands:

  • Why the user has this problem
  • Why this solution addresses it
  • How the implementation works
  • Whether the code does what the spec intended

The “Feature Factory” Gets Worse

Keisha’s two-tier system is emerging on the product side too:

  • Tier 1 PMs: Deep user understanding, strategic thinkers, can evaluate AI outputs against first principles
  • Tier 2 PMs: AI-assisted feature writers, execute roadmaps without deep comprehension, increasingly commodity

I’m already seeing this in hiring. Candidates who can articulate why a feature exists command significantly higher offers than candidates who can just ship features fast.

The Metric We Should Be Tracking

Luis asked “how do we measure comprehension debt?” Here’s what I’m experimenting with on the product side:

Feature Validation Rate = (Features that solve the intended user problem) / (Features shipped)

Traditional product metrics:

  • :white_check_mark: Shipped 47 features in Q1 (up 40% YoY)
  • :white_check_mark: Sprint velocity increased 25%
  • :white_check_mark: Engineering capacity utilization at 94%

Feature validation rate:

  • :cross_mark: Only 18 of 47 features (38%) showed measurable improvement in target user metric
  • :cross_mark: 12 features (26%) had to be significantly reworked post-launch
  • :cross_mark: 8 features (17%) were deprecated within 90 days

We optimized for output, not outcomes.

AI tools let us generate and ship faster. But if we’re shipping the wrong things faster, that’s not progress—it’s waste at higher velocity.

What I’m Changing

Starting next quarter:

  1. Feature proposals require PM self-explanation, not just AI-generated specs (“explain the user problem and solution in your own words, without AI”)
  2. Post-launch validation reviews track whether shipped features achieved intended outcomes (not just “did we ship it”)
  3. Promotion criteria emphasize problem understanding, not shipping velocity

The Question I Keep Coming Back To

If AI lets us generate code/specs/analysis 5-7x faster, but we only understand 17% less…

…that math actually checks out as a net positive (5-7x output for 17% comprehension loss).

Except the 17% we’re losing is in debugging, validation, and system thinking—the exact capabilities we need to evaluate whether the AI output is correct.

So we’re losing the ability to validate the very outputs we’re generating faster.

That’s not a 17% comprehension tax. That’s a compounding error rate.

How many quarters can you ship at 5-7x velocity with declining validation capability before the accumulated comprehension debt causes a systemic failure?

I don’t know the answer. But I’m not optimistic it’s “indefinitely.”

This entire thread confirms what I’ve been worried about for months. We’re at an inflection point where the decisions we make about AI adoption in the next 6-12 months will determine the technical capability of our organizations for the next decade.

The CTO Dilemma: Short-Term Pressure vs Long-Term Capability

Here’s the tension I’m navigating:

Board/CEO pressure:

  • “Competitors are shipping faster with AI”
  • “Why isn’t our velocity increasing 3-5x like we read about?”
  • “We need to do more with the same headcount”

What I’m actually seeing:

  • 15-22% incident rate increases (Luis’s numbers mirror mine)
  • Mean time to resolution up 18% because debugging requires understanding
  • Senior engineers spending more time fixing AI-generated code than AI saved in generation
  • Growing knowledge gaps where nobody on the team fully understands critical systems

The existential question: Do I optimize for this quarter’s output or next decade’s capability?

The Anthropic Study Validates My Concerns

The 17% comprehension drop without productivity gains is almost a best-case scenario—that was in a controlled study with motivated participants learning a new library.

In production systems with:

  • Legacy code nobody fully understands
  • Tight deadlines and pressure to ship
  • Junior engineers defaulting to AI for everything
  • No deliberate training on high-engagement AI usage

…I’d bet the comprehension drop is closer to 30-40%.

We’re Creating “Orphaned Systems”

Product_david’s “compounding error rate” is the systemic failure I’m worried about. We’re creating what I call “orphaned systems”—codebases where:

  1. Original architects have left (natural attrition)
  2. Remaining engineers used AI to modify/extend without full understanding
  3. Documentation doesn’t reflect reality (AI-generated docs that look right but aren’t)
  4. Nobody can explain core architectural decisions (institutional knowledge lost)

When these systems fail—and they will—you have three options:

  1. Rewrite from scratch (expensive, risky, months of lost productivity)
  2. Bring back original engineers as consultants (Maya’s $15K fix for a “free” optimization)
  3. Limp along with degraded reliability (accumulating tech debt faster than you can pay it down)

All three are vastly more expensive than the velocity gains AI provided.

Keisha’s Structured Approach Is the Only Sustainable Path

The data from Keisha’s cohort experiment is exactly what I needed to see:

  • Cohort A (AI + training): 12% productivity, 3% incidents ← This is the only sustainable model
  • Cohort B (AI without training): 8% productivity, 18% incidents ← This is what most orgs are doing
  • Cohort C (no AI): Baseline ← This isn’t viable competitively

The key insight: AI with deliberate training outperforms AI with unrestricted access.

What I’m Implementing

Starting next month across our 120-person engineering org:

1. AI Comprehension Framework

  • Every AI-generated PR requires self-explanation (can’t merge without it)
  • Code review template explicitly asks “Do you understand how this works?”
  • Incident post-mortems track whether comprehension debt contributed

2. Tiered AI Access Based on Demonstrated Comprehension

  • Level 1 (0-90 days): AI-assisted with mandatory explanations (Maya’s onboarding idea, with Keisha’s modification)
  • Level 2 (90+ days): Full AI access with comprehension spot-checks
  • Level 3 (senior/staff): Full access + responsibility to model high-engagement patterns

3. Engineering Scorecard Rebalance

Old metrics (50% weight):

  • Story points completed
  • Deployment frequency
  • Lead time for changes

New metrics (50% weight):

  • Mean time to resolution (comprehension proxy)
  • Post-deployment defect rate (validation capability)
  • System understanding assessments (quarterly comprehension checks)

4. Architectural Decision Records (ADRs) Required for AI-Modified Systems

If AI suggests changes to core systems, engineer must document:

  • What changed
  • Why (in their own understanding, not AI’s explanation)
  • What could break
  • How to debug if it fails

The ROI Argument I’m Making to the Board

“AI coding tools offer 5-7x generation velocity. But if comprehension drops 17%, and comprehension is required for debugging, maintenance, and evolution—then we’re borrowing from future capability to fund current output.”

The question isn’t “should we use AI?” It’s “how do we use AI in a way that compounds capability rather than eroding it?”

The Anthropic study showed this is possible: high-engagement users maintained comprehension while using AI.

That’s the only sustainable path forward.

The Industry Will Split

David’s two-tier prediction and Keisha’s market pricing data suggest we’re heading toward a bifurcation:

Tier 1 organizations:

  • Invest in comprehension-aware AI adoption
  • Train engineers in high-engagement patterns
  • Maintain system understanding as core capability
  • Command premium talent and customer trust

Tier 2 organizations:

  • Chase velocity metrics without comprehension investment
  • Accumulate orphaned systems and technical debt
  • Face degrading reliability and incident rates
  • Compete on cost with declining differentiation

I know which tier I want to be in.

The question is whether I can convince the board to invest in training and process when competitors are claiming “10x productivity with AI” without mentioning the 18% incident rate increase.

This thread gives me the data I need to make that case. Thank you.