20-30% Gains on Simple Tasks, 90% on Refactoring—Should We Rebalance Our Sprint Work Mix?

I’ve been thinking a lot about AI productivity data lately, and there’s something that doesn’t quite add up with how we’re planning sprints.

The numbers are clear: AI coding assistants give us about 20-30% throughput improvements on simple feature work, but we’re seeing up to 90% productivity gains on refactoring and testing. That’s a massive difference. Yet our sprint planning still treats all work as roughly equal effort.

Here’s what I’m wrestling with: If AI makes refactoring 90% faster, should we be doing 3x more of it?

The Current State

At my team, we follow a pretty typical sprint allocation:

  • 60% new features
  • 20% bug fixes
  • 20% technical debt / refactoring

This made sense when all work types took similar effort. But with AI assistance, that 20% refactoring budget now represents way less calendar time than the 60% feature work. We’re artificially constraining our capacity for high-leverage cleanup work.

The Asymmetry Problem

The data is compelling (source):

  • Simple CRUD features: 20-30% faster with AI
  • Unit test generation: 50%+ faster
  • Refactoring existing code: 70-90% faster
  • Boilerplate/scaffolding: 55% faster

Traditional sprint planning assumes uniform productivity across task types. That assumption is broken now.

The Opportunity

If we rebalanced our sprint mix to lean into areas where AI excels, we could:

  • Dramatically reduce technical debt backlog
  • Increase test coverage without sacrificing features
  • Free up mental energy currently spent context-switching between messy code

But I’m cautious. Just because we can refactor faster doesn’t automatically mean we should refactor more.

What I Want to Know

For other engineering leaders: Have you adjusted your sprint planning based on AI productivity asymmetry?

What I’m specifically curious about:

  • Are you allocating more sprint capacity to testing/refactoring given the productivity gains?
  • How do you decide which refactoring is worth doing vs. busywork?
  • Have you seen this translate to actual business outcomes, or just happier engineers?

I feel like there’s a strategic advantage hiding in this asymmetry, but I haven’t figured out how to capture it systematically. Would love to hear how others are thinking about this.


Context: Engineering Director leading 40+ engineers at a financial services company. We adopted GitHub Copilot enterprise-wide 8 months ago.

This is a fascinating question, Luis, but I’m worried we might be optimizing for the wrong thing here.

If AI makes refactoring 90% faster, should we be doing 3x more of it?

My gut reaction: Not necessarily. Just because we can refactor faster doesn’t automatically mean we should refactor more. We need to be really intentional about what we’re optimizing for.

The “Churn for Churn’s Sake” Trap

I’ve seen this pattern before (not with AI, but with other productivity tools). Something gets easier, so we do more of it, without asking if it’s creating actual value. Then six months later we look back and realize we were busy but not impactful.

Here’s what keeps me up at night: How do we decide which refactoring actually adds value vs. which is just… cleaner code for the sake of cleaner code?

The Value Question

Not all refactoring is equal:

  • :white_check_mark: High value: Refactoring that unblocks future features or reduces bugs
  • :white_check_mark: Medium value: Improving performance or accessibility
  • :cross_mark: Low value: Renaming variables to match new style guide
  • :cross_mark: Negative value: Refactoring code that nobody will touch again for 2 years

If AI makes all of these 90% faster… should we really be spending 3x more time on that last category?

What Actually Matters

From a design perspective, I care about:

  1. Does this improve the user experience? (performance, reliability, accessibility)
  2. Does this make iteration faster? (easier to add features, test, debug)
  3. Does this reduce cognitive load for the team? (less mental overhead)

If the refactoring doesn’t move the needle on at least one of those, I’m skeptical that it’s worth doing—even if AI makes it “free” from a time perspective.

A Different Frame

Instead of asking “should we do more refactoring,” maybe ask:

  • What refactoring has been on the backlog because it was too time-consuming before?
  • Which parts of the codebase create the most friction for new features?
  • What cleanup would make onboarding new engineers 50% faster?

Those are strategic questions that connect refactoring to business outcomes. “We can do it faster” isn’t enough of a reason by itself.


That said—I’m super curious to see if teams who rebalance their sprint mix actually see measurable improvements. It’s entirely possible I’m being too conservative here. Would love to be proven wrong! :slightly_smiling_face:

Luis, this is exactly the kind of strategic question product leaders need engineering to be thinking about. But I want to push back on the framing a bit.

The Business ROI Lens

Maya’s right that not all refactoring is equal. But here’s the product perspective: 90% faster refactoring only matters if it unblocks features, reduces bugs that affect users, or enables faster iteration.

Otherwise we’re just rearranging deck chairs—faster, yes, but still not moving the ship forward.

A Framework for Prioritization

I’d suggest categorizing sprint work on two dimensions:

1. AI Leverage (how much AI helps)

  • High: Testing, refactoring, boilerplate
  • Medium: Simple features, bug fixes
  • Low: Complex architecture, product decisions

2. Business Impact (how much it matters)

  • High: Unblocks roadmap, improves core metrics, reduces critical bugs
  • Medium: Improves developer experience, reduces technical debt
  • Low: Nice-to-haves, style consistency

Then prioritize work that’s both:

  • :white_check_mark: High AI leverage + High business impact = Do more of this
  • :warning: High AI leverage + Low business impact = Still probably not worth it

The Missing Metric

What we’re really missing is: How does this translate to customer value?

Examples of refactoring with clear ROI:

  • Refactored checkout flow → 15% faster page load → 8% conversion lift
  • Refactored API layer → enabled v2 enterprise features → K ARR
  • Improved test coverage → 40% fewer production bugs → reduced support costs

Examples of refactoring with unclear ROI:

  • “Cleaned up legacy code” → … and then what?
  • “Improved code organization” → measurable outcome?

A Pilot Approach

Here’s what I’d propose:

Sprint 1 (Baseline): Current 60/20/20 mix, measure:

  • Features shipped
  • Bugs closed
  • Customer-reported issues
  • Team velocity

Sprint 2-3 (Rebalanced): Shift to 50/15/35 mix (more refactoring/testing), measure same metrics

Sprint 4 (Analyze): Did the rebalancing translate to:

  • Faster feature velocity in subsequent sprints?
  • Fewer production bugs?
  • Higher customer satisfaction?

If yes, make it permanent. If no, understand why and adjust.


Bottom line: I love the idea of optimizing for AI asymmetry, but we need to connect it to business outcomes. Otherwise we’re just shipping cleaner code instead of shipping value.

What do you think about running this as a controlled experiment?

Luis, I tried this exact experiment at my previous company. The results surprised me—and not in the way I expected.

What We Did

We rebalanced two squads to spend 40% of sprint capacity on refactoring/testing (up from 20%), specifically targeting areas where AI showed the biggest productivity gains.

The good news: We knocked out technical debt faster than ever. Test coverage went from 62% to 81% in three sprints. The codebase objectively got cleaner.

The unexpected news: We saw unexpected resistance from the team.

The Team Morale Issue

Here’s what I didn’t anticipate: Engineers, especially junior ones, want to work on “new things,” not just cleanup.

After two months, we started hearing:

  • “I feel like I’m just maintaining code, not building anything”
  • “My portfolio is all refactoring PRs—what do I show in interviews?”
  • “When do we get back to shipping features?”

Three mid-level engineers started interviewing elsewhere. Two explicitly said they missed “building” and felt stuck in maintenance mode.

The Skill Development Angle

There’s another dimension here: Career growth.

Junior developers need to:

  • Ship features to build confidence
  • See their work in production and used by customers
  • Build a portfolio that demonstrates impact
  • Learn product thinking, not just code quality

If we optimize purely for AI efficiency and load up on refactoring, we might hurt retention and development of our early-career engineers.

The Balance I’m Trying Now

At my current company, I’m taking a different approach:

Sprint Mix:

  • 50% features (down from 60%)
  • 15% bugs
  • 25% testing/refactoring (up from 20%)
  • 10% “strategic cleanup” (AI-assisted big refactors)

But also:

  • Rotate who gets refactoring work (not the same people every sprint)
  • Pair junior + senior on complex refactors (learning opportunity)
  • Require refactoring to connect to upcoming features (not just cleanup)
  • Celebrate refactoring wins in demos (“This enabled next quarter’s checkout v2”)

The Real Question

Luis, I think your instinct is right that there’s opportunity here. But we can’t optimize purely for velocity without considering:

  • Team morale: Do engineers feel energized or stuck in maintenance mode?
  • Career development: Are we building skills or just managing code?
  • Retention risk: Will our best engineers leave because they’re bored?

My advice: Yes, rebalance toward AI-friendly work—but do it thoughtfully with team buy-in, clear outcomes, and rotation to keep it from feeling like a grind.


Curious if others have seen similar morale impacts when shifting sprint mix?

This is a critical strategic question, Luis. But I want to add a reality check based on data we’re seeing across the industry.

The Quality Trade-Off Nobody’s Talking About

Everyone’s focused on “90% faster refactoring,” but here’s what the research shows (source):

AI-generated code quality:

  • Pull requests with AI code have 1.7× more issues than human-written code
  • Code duplication is up 4× with AI
  • 23.7% more security vulnerabilities in AI-assisted code

So yes, we can refactor 90% faster. But the review process becomes the new bottleneck.

The Review Bottleneck

Here’s what we’re experiencing:

Before AI:

  • Write code: 8 hours
  • Review: 2 hours
  • Total: 10 hours

With AI:

  • Write code: 2 hours (75% faster)
  • Review: 4 hours (2× longer—more code to review, more issues to catch)
  • Total: 6 hours

Net improvement: 40%, not 75%. And that assumes we have enough senior eng capacity to do thorough reviews.

What I’m Seeing Work

Instead of just increasing refactoring volume, I’d suggest rebalancing toward review and quality verification capacity:

Option 1: Invest in AI Code Review Tooling

  • We implemented Codium AI for automated quality checks
  • Catches syntax/style issues before human review
  • Reduced senior engineer review time by ~30%
  • Still need humans for architectural decisions

Option 2: Tiered Review Process

  • Low-risk refactoring (tests, formatting): AI review + junior engineer spot check
  • Medium-risk (component refactors): Standard peer review
  • High-risk (architecture changes): Senior engineer + architect review

Option 3: Quality Gates

  • Require AI-generated refactoring to pass:
    • Automated security scans
    • Performance regression tests
    • Code complexity analysis
  • Only merge if metrics improve or stay flat

Strategic Recommendation

Don’t just rebalance toward more refactoring. Rebalance toward:

  1. Testing (where AI quality is higher)
  2. Review capacity (add tooling, process improvements)
  3. Strategic refactoring (high-impact areas only)

And definitely track:

  • Bug escape rate (production issues from AI refactoring)
  • Security vulnerability introductions
  • Code review cycle time
  • Engineer cognitive load (are they drowning in review?)

My Answer to Your Question

Should we be doing 3× more refactoring if AI makes it 90% faster?

My take: Only if:

  1. :white_check_mark: You have review capacity to match (tooling + process)
  2. :white_check_mark: You’re measuring quality outcomes, not just velocity
  3. :white_check_mark: The refactoring has clear strategic value (not just “cleaner code”)
  4. :white_check_mark: You’re prepared for higher bug rates and security risks

Otherwise, you’re creating technical debt while trying to reduce it.


I’d love to hear from others: Are you seeing quality trade-offs with AI-assisted refactoring? How are you managing the review bottleneck?