Six months ago, my engineering team of 40+ at a Fortune 500 financial services company went all-in on GitHub Copilot. The individual feedback was incredible—developers told me they felt “way more productive,” “faster than ever,” and “like they had a senior engineer pair programming with them 24/7.”
But here’s what’s keeping me up at night: our sprint velocity is flat. Features are taking the same amount of time to ship, sometimes longer. And when I dug into the data, I found something unsettling.
The Numbers Don’t Add Up
We’re shipping 30% more lines of code than we were six months ago. Commit frequency is up 25%. Individual task completion is genuinely faster—developers close tickets quicker than before.
Yet our cycle time from “code complete” to “deployed to production” has actually increased by 18%.
The bottleneck? Code reviews. PRs are now routinely 2x larger. What used to be a 200-line change is now 400+ lines. Our review queue has become a parking lot, and reviewers are overwhelmed trying to validate AI-generated code they didn’t write and sometimes don’t fully understand.
AI Amplifies What You Already Have
I recently read the DORA Report 2025, and one line hit me hard: “AI magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones.”
That’s us. AI didn’t create our code review bottleneck—it exposed it. Our review process was already fragile. We had informal SLAs, inconsistent practices, and no clear capacity planning. When PR volume stayed constant, we could muddle through. But AI turned up the volume, and the system broke.
Research from IT Revolution shows developers using Copilot complete 26% more tasks individually. But there’s a gap between individual productivity and organizational throughput. We’re experiencing that gap firsthand.
The Uncomfortable Truth
As an engineering leader, I focused on AI adoption—securing licenses, running training sessions, celebrating the velocity charts going up. But I ignored the process improvements that should have come first.
I’m realizing now that AI tools are force multipliers. And if you multiply a broken process, you get more brokenness, faster.
The Faros AI research on the AI productivity paradox found that developers believe they’re working 24% faster with AI, but controlled studies show they’re actually 19% slower when you measure end-to-end delivery. That’s a 43-point perception gap. My team’s perception is that we’re crushing it. The reality is we’re shipping the same features at the same pace, just with more code and longer review cycles.
What I’m Changing
We’re hitting pause on aggressive AI adoption and focusing on organizational readiness:
- Establishing code review SLAs - No PR should wait more than 4 hours for initial review
- Review capacity planning - Dedicating 30% of senior engineer time explicitly to reviews
- AI-specific review guidelines - Training reviewers to spot common AI-generated anti-patterns
- Smaller PR culture - Setting guidelines that discourage AI from generating massive changes
But I’m still wrestling with the bigger questions:
Has anyone else seen AI tools make existing bottlenecks more visible? Did it force you to confront process issues you’d been ignoring?
For teams that successfully scaled with AI - What did you fix organizationally before or during adoption?
And the hardest question - How do you balance the individual developer experience (they love Copilot) with organizational effectiveness (we’re not actually shipping faster)?
I don’t want to be the director who killed AI tools because I couldn’t fix our processes. But I also can’t keep showing executives velocity charts that don’t translate to customer value.
Would love to hear if others have navigated this paradox successfully.