Over the past six months, I’ve been leading my engineering teams at a Fortune 500 financial services company through AI adoption. The feedback from developers has been overwhelmingly positive—they consistently report feeling more productive, spending less time on boilerplate, and shipping features faster. But when I look at our team velocity metrics and delivery timelines, the story isn’t quite as clear. We’re seeing modest improvements, maybe 10-15%, but nothing close to the 20-50% productivity gains the research promises.
This disconnect led me down a research rabbit hole, and what I discovered challenges how we think about developer experience in the AI era.
The Three Pillars of Developer Experience
Recent research from DX and published in ACM Queue identifies three core dimensions that determine how developers experience their work: feedback loops, cognitive load, and flow state. These aren’t abstract concepts—they’re practical areas we can observe, measure, and improve. Organizations with better developer experience see each one-point improvement correlate to 13 minutes of saved developer time per week.
But here’s what caught my attention: AI tools impact all three dimensions, and not always in the ways we expect.
The Cognitive Load Trade-off
We adopted AI coding assistants expecting them to reduce cognitive load. And in some ways, they do—handling boilerplate code, generating test scaffolding, and answering “what does this error mean?” questions. Developers who report high understanding of their code feel 42% more productive than those who don’t.
But AI introduces a new pattern that researchers at JetBrains and UC Irvine call “stealth friction.” When developers use AI assistants, they engage in what I now recognize as a constant cycle:
- Write a prompt (context switch from code to instruction)
- Wait for generation (mental model interrupted)
- Review the output (switch from creating to evaluating)
- Debug and integrate (switch back to hands-on coding)
The fascinating—and concerning—part? In their study, 74% of developers didn’t notice they were context switching more frequently. The switching doesn’t feel like switching, but the cognitive cost accumulates.
What We’re Seeing on the Ground
In our financial services engineering teams, I’ve observed this play out in ways that surprised me. Our senior engineers, the ones who adopt new tools most enthusiastically, started reporting something unexpected: the minutes they saved generating boilerplate were often wiped out by the time spent reviewing, debugging, or completely rewriting AI-generated code.
One of my tech leads put it perfectly: “I’m no longer in the code—I’m managing the assistant.” That shift from creator to manager represents a fundamental change in cognitive load distribution.
The Productivity Paradox
This matches what the broader industry is experiencing. Over 75% of developers now use AI coding assistants, yet organizational productivity gains haven’t kept pace. Companies report seeing individual task completion speed up, but delivery velocity and business outcomes remain relatively flat.
The research suggests why: AI amplifies whatever organizational state already exists. In teams with strong processes, clear architecture, and fast feedback loops, AI acts as a force multiplier. In teams struggling with slow code reviews, unclear requirements, or brittle test suites, AI highlights and magnifies those existing problems.
The Feedback Loop Question
This brings me to what I think is the critical question: Should we optimize our feedback loops before layering AI on top of them?
When builds take hours instead of minutes, feedback loops are already broken. Adding AI that generates more code faster doesn’t fix the slow build—it just means developers write more code that they’ll wait longer to test. We’re optimizing the wrong part of the system.
Same with code review. If PR turnaround time is your bottleneck (research suggests same-day reviews are ideal), generating code faster with AI just creates a bigger backlog for reviewers.
What Should We Measure?
I’m increasingly convinced that we’re measuring the wrong things. Velocity—lines of code, PRs merged, commits per developer—tells us about activity, not value. But cognitive load reduction and flow state preservation? Those are harder to quantify but ultimately more important.
The DX research framework suggests pairing quantitative telemetry (build times, deployment frequency, PR cycle time) with qualitative just-in-time surveys triggered by workflow events. Ask developers right after they submit a PR: “How clear were the requirements? How much context switching did you experience? Did you feel you had time for deep work?”
Looking for Perspectives
I’m curious how other engineering leaders and teams are navigating this:
- What are you measuring to understand AI’s real impact—velocity metrics or something else?
- Have you noticed the “stealth friction” of AI context switching on your teams?
- How do you balance the speed of AI generation against the cognitive cost of review and integration?
- Are you optimizing feedback loops first, or introducing AI first and fixing processes later?
In my experience leading diverse engineering teams across multiple time zones, the best solutions come from diverse perspectives. What’s working—or not working—for you?
Some relevant research that shaped my thinking: