Six months ago, I greenlit GitHub Copilot and similar AI coding tools for my entire engineering organization. The productivity numbers looked incredible on paper: 55% faster task completion, 3.6 hours saved per developer per week. My team was shipping features at a pace we’d never seen before.
Last week, I ran a different analysis. I tracked how much time my engineers spend reviewing, testing, and correcting AI-generated code. The number? 15 hours per week per developer.
We’re shipping 31% faster, but we’re spending nearly double that time second-guessing every line.
The Trust Gap Is Real
Here’s what I’m seeing across my teams:
84% adoption, but only 29% trust. Every single one of my engineers uses AI tools daily. But in our retrospectives, when I ask “do you trust the code it generates?” — the room goes quiet. They use it because it’s fast. They trust it… conditionally. With verification. After review.
The verification burden is crushing. One of my senior engineers told me: “Reviewing AI code takes more cognitive effort than reviewing code from our junior developers. At least I understand how a junior thinks. AI surprises me in ways I can’t predict.”
That hit hard. 38% of developers report that reviewing AI-generated code requires more effort than reviewing human-written code. We’re not just talking about a quick glance — this is deep, careful review work.
The Productivity Paradox
Here’s the math that keeps me up at night:
- Time saved writing code: 3.6 hours/week
- Time spent verifying code: 15 hours/week
- Net productivity change: -11.4 hours/week
And yet, we’re shipping faster. How does that make sense?
The only explanation: we’re skipping verification. The research backs this up — only 48% of developers always check their AI-assisted code before committing it. That means 52% of AI-generated code enters our codebase with incomplete review.
That scares me. We’re in EdTech. Our code serves millions of students. The idea that we’re shipping faster because we’re verifying less — that’s not a productivity win. That’s technical debt with a timer.
The Perception Gap
What really bothers me is the perception-reality gap. A study by METR found that developers using AI tools were actually 19% slower than those coding without assistance, despite believing they were 20% faster.
Think about that. We feel productive. The dopamine hit of watching code appear on screen is real. But if you measure end-to-end delivery time, including review, bug fixes, and rework — we’re slower.
Are we lying to ourselves about the gains?
The Team Morale Impact
This trust gap is affecting team dynamics in ways I didn’t anticipate:
Junior engineers feel lost. They’re using AI to write code they don’t fully understand. When bugs appear, they don’t have the foundational knowledge to debug. One junior dev told me: “I feel like I’m becoming a code reviewer, not a code writer.”
Senior engineers feel overwhelmed. They’re stuck being the “AI validators.” Every PR now comes with an unspoken question: “Did a human write this, or do I need to check every edge case?”
Code review culture is changing. We used to review for design decisions. Now we review for correctness. That’s a regression.
What I’m Trying
I don’t have answers yet, but here’s what we’re experimenting with:
-
Tiered trust model: Not all AI suggestions are equal. Boilerplate? Ship it. Complex business logic? Mandatory human review.
-
Verification time as a metric: I’m tracking review time alongside shipping velocity. If verification time grows faster than code output, we’re regressing.
-
AI literacy training: Teaching engineers how to effectively prompt, review, and validate AI outputs. It’s a skill, not a toggle.
-
Honest retrospectives: Creating space for engineers to admit when AI hurt their productivity, not just when it helped.
My Question to This Community
How are you measuring net productivity when AI tools are in the mix?
Are you tracking verification time? Review cycles? Bug rates on AI-assisted code?
Or are we all just looking at the “time saved writing code” metric and calling it a win?
I want to believe the 31% productivity gain is real. But right now, it feels like we’re trading speed for trust — and I’m not sure that’s a trade worth making.
What am I missing?