AI makes us write code faster. Are we just creating faster bottlenecks?

I’ve been using Cursor and Copilot pretty heavily on our design systems rebuild, and I’m noticing something weird: I’m writing components way faster, but our PR cycle hasn’t sped up at all. If anything, it feels slower.

Here’s what’s happening: AI generates a component in minutes that would’ve taken me an hour. Sounds amazing, right? But then I spend 45 minutes verifying the output, checking edge cases, making sure accessibility is correct, confirming it follows our design tokens… and I’m back to roughly the same total time.

The data backs this up. Research shows that 96% of developers don’t fully trust AI-generated code is functionally correct. Even more interesting: 38% say reviewing AI-generated code takes MORE effort than reviewing code written by humans (source).

From a design perspective, this reminds me of the old problem: I can generate 100 design variations in Figma with plugins, but I still need to manually evaluate each one for usability, accessibility, and brand alignment. Fast generation doesn’t mean fast validation.

Are we optimizing the wrong part of the process?

Everyone talks about AI making us write code faster. But if verification becomes the bottleneck, did we actually save time? Or did we just shift where we spend it?

I’m curious how other teams are handling this. Are your verification processes keeping up with AI-assisted development? Or are you also finding that the speedup in writing gets eaten by slowdown in reviewing?

This is exactly what I’m seeing across our engineering teams. Developers are committing code faster, but the time from first commit to merged PR hasn’t changed—and in some cases has gotten worse.

The numbers are striking when you look at the research: 154% increase in PR size, 91% increase in code review time, and a 9% increase in bug rates correlated with 90% increase in AI adoption (source).

What’s happening is that AI is generating more code per feature, which means more surface area to review. And because reviewers can’t trust that AI-generated code handles edge cases correctly, they’re doing deeper reviews.

Should we be measuring “time to verified production code” instead of “time to first commit”?

In our team retrospectives, developers are frustrated because their individual velocity looks great on paper—lots of commits, lots of code written. But team velocity hasn’t improved because the merge queue is longer and reviews take more time.

I’m starting to think we need different metrics for the AI era. Tracking lines of code written or commits per day doesn’t capture the real bottleneck anymore.

Oh wow, this explains SO much about what I’ve been seeing from the product side.

Our engineering velocity dashboards show more commits, more code shipped per sprint. Leadership sees these numbers and thinks we’re moving faster. But when I look at actual feature delivery and release cadence, we’re shipping roughly the same number of features as before AI tools became standard.

The business is confused: more code, same output.

I think the disconnect is that we’re measuring activity (commits, PRs opened) instead of outcomes (features shipped, bugs resolved, user value delivered).

From a product perspective, what matters is: How quickly can we go from idea to validated value in production? If AI helps us write code in 20% of the time but verification takes 4x longer, we haven’t actually improved that metric.

Are we measuring the wrong productivity metrics across the industry? Maybe we need to track different leading indicators now that AI has changed the bottleneck.

This thread is hitting on something critical we’re grappling with organizationally. The data point that really stood out to me: 75% of developers believe AI reduces toil, but actual time spent on toil tasks remains static at 24% of the work week (source).

The bottleneck didn’t disappear. It just moved.

We used to spend time writing boilerplate, setting up structure, implementing patterns. Now AI does that in seconds. But we spend that saved time verifying correctness, checking edge cases, ensuring the AI understood context properly.

The work shifted from creation to validation.

What concerns me from an organizational scaling perspective: we’re hiring for and training the wrong skills. We’re still optimizing for developers who can write code quickly. But in an AI-assisted world, the constraint is developers who can verify, test, and validate code quickly and thoroughly.

This requires a different skillset—more focus on testing strategies, specification writing, understanding system behavior, and architectural thinking. Less focus on memorizing syntax or implementing common patterns.

Are engineering orgs ready for this shift? Or are we still hiring and developing talent for the pre-AI bottleneck?