Six months ago, I rolled out GitHub Copilot, Cursor, and ChatGPT across my 40-person engineering team at a major financial services company. The promise was clear: AI would handle the boilerplate, freeing engineers to focus on architecture and problem-solving. Our developers would be more productive, happier, and we’d ship features faster.
The reality? My team is now working 12-hour days, burnout is at an all-time high, and velocity has barely improved despite everyone “saving time.”
The Data Doesn’t Lie—But It’s Confusing
According to Chainguard’s 2026 Engineering Reality Report, 83% of engineers say AI increased their workload, and 62% of associate-level engineers are experiencing burnout. Yet the same report shows 89% of organizations claim engineers save at least 3 hours per week thanks to AI tools.
So where did those 3 hours go? Because they certainly didn’t translate into shorter workdays or less stressed teams.
What We Expected vs. What Actually Happened
What we expected:
- Faster code generation → more features shipped
- Less time on boilerplate → more time for architecture
- AI handles the boring stuff → engineers focus on creative work
What actually happened:
- Faster code generation → stakeholders expect 2x output
- AI-generated code requires intensive review → new bottleneck created
- “Boring stuff” automated → but now we’re debugging AI instead
The Supervision Paradox
Here’s what I didn’t anticipate: reviewing AI-generated code is harder than writing it yourself.
When you write code, you carry the context of every decision in your head. You know why you chose that data structure, how you’re handling edge cases, what trade-offs you made. When AI writes code, you inherit the output without the reasoning. You see the implementation, but you don’t see the decisions—and you don’t know what assumptions were baked in or what edge cases were ignored.
As Ivan Turkovic articulated perfectly: “AI made writing code easier. It made engineering harder.”
The production bottleneck didn’t disappear—it moved from writing to understanding. And understanding is much harder to speed up.
The Expectation Creep Problem
The worst part? Leadership now assumes we have infinite capacity.
Product managers who used to ask for 5 features per sprint are now asking for 12. Executives see “AI productivity gains” in headlines and wonder why we can’t just “add it to the sprint.” The faster we code, the more we’re expected to deliver—and the expectations are outpacing our actual sustained capacity.
One of my senior engineers told me last week: “I’m coding faster than ever, but I’ve never felt more behind.”
The Hidden Costs Nobody Talks About
Tool sprawl: My team now uses Copilot for autocomplete, Cursor for refactoring, ChatGPT for architecture questions, and Claude for code review. 88% of engineers report that switching between tools negatively affects productivity. We saved time on coding but lost it to context switching.
Cognitive load: Junior engineers are learning to prompt AI instead of learning to design systems. They can generate a working function in seconds, but they struggle to explain why it works or debug when it doesn’t.
Review capacity crisis: We’re generating 3x more code, but our review capacity hasn’t scaled. Pull requests are larger, reviews take longer, and subtle bugs slip through because reviewers are overwhelmed.
So What Do We Do About It?
I don’t have all the answers, but here’s what I’m wrestling with:
-
Are we measuring the wrong things? Individual speed vs. team outcomes? Lines of code vs. features shipped? Utilization vs. sustainable pace?
-
Should we intentionally slow down? If AI makes generation fast but review slow, maybe the answer is deliberate friction—require engineers to write specs before prompting AI, limit AI-generated code per PR, mandate explanation comments?
-
How do we manage stakeholder expectations? When leadership reads “AI boosts productivity 30%,” how do we explain that our sprint capacity didn’t actually increase?
-
What skills matter in an AI-first world? If juniors learn to orchestrate AI instead of write code, what happens when AI makes a subtle mistake they can’t recognize?
The Question I Can’t Stop Asking
Harvard Business Review and UC Berkeley research both found the same pattern: AI doesn’t reduce work—it intensifies it. When work becomes easier to push forward, people simply push more work through the system. They work faster, take on broader tasks, and extend work into more hours of the day.
So here’s what keeps me up at night: Are productivity tools making us work harder, not smarter? And if 83% of engineers say AI increased their workload, are we optimizing for the wrong thing?
How are other engineering leaders handling this? What metrics are you actually tracking? And how do you push back when “AI productivity gains” become an excuse to inflate expectations?
I’d love to hear how other teams are navigating this paradox.