Leadership thinks Copilot = infinite capacity. How do you push back on AI-driven scope creep?

Three months ago, our company rolled out GitHub Copilot across all engineering teams. Last week, our VP of Product increased our sprint commitments by 35%. When I pushed back, the response was simple: “You have AI now. The team should be able to handle more.”

Sound familiar?

The Infinite Capacity Myth

I’m seeing this pattern everywhere. Leadership reads the ROI claims, sees “productivity gains,” and assumes teams now have unlimited capacity. Sprint expectations inflate by 30-40%. Roadmap timelines get compressed. Feature requests that would’ve been “Q3 maybe” suddenly become “next sprint definitely.”

But here’s what the data actually shows:

The Productivity Paradox: Research from METR found that developers using AI tools took 19% longer to complete tasks. Yet after the study, those same developers estimated they were 20% faster. We’re not just measuring wrong—we’re feeling wrong about our own productivity.

The Utilization Drop: In our own org, I’ve watched developer utilization of Copilot drop to 22% within 30 days of rollout. The initial excitement fades fast when developers realize the suggestions need as much debugging as writing code from scratch.

The Quality Trade-off: 66% of developers cite inaccurate AI code suggestions as their top challenge. The code looks correct but fails during testing. Time saved in writing gets consumed by checking and editing. Net productivity gain? Minimal at best.

The Burnout Consequence

The real cost isn’t just missed deadlines—it’s people. Since our sprint commitments increased:

  • My team leads are working 12-15 hour days trying to meet inflated expectations
  • Junior engineers feel like they’re “failing” when they can’t match the supposed AI productivity multiplier
  • Our best senior engineer told me she feels like a “janitor cleaning up AI messes” instead of building features

When productivity gains get absorbed by higher demands instead of time savings, burnout follows. And burned-out engineers don’t ship quality software—regardless of what AI tools they have.

The Push-Back Problem

So here’s my question to this community: How do you educate leadership about AI’s actual limitations?

I’ve tried:

  • Sharing the research data (eyes glaze over)
  • Showing sprint velocity trends (doesn’t match their mental model)
  • Explaining that coding is 15% of the job (dismissed as excuses)

What’s worked for you? Do you:

  • Track specific metrics that resonate with execs?
  • Use particular frameworks for setting realistic expectations?
  • Have regular “AI reality check” meetings?
  • Frame it differently than “pushing back”?

I’m particularly interested in hearing from other engineering leaders who’ve successfully reset expectations after an AI tool rollout. What data, stories, or frameworks actually got through?

Because right now, the gap between what leadership thinks AI delivers and what teams actually experience is creating unsustainable pressure. And I’m running out of ways to bridge it.


For context: I lead a 40-person engineering org in financial services. We adopted Copilot company-wide in December 2025. Initial excitement was high, but the reality has been much more nuanced than the marketing promised.

Luis, I feel this deeply. I’ve made this exact mistake at the board level—and I’ve had to unwind it.

When we first rolled out AI coding tools across our engineering org last year, I presented overly optimistic productivity projections to the board. I used vendor-provided ROI numbers (classic mistake) and generalized them across all development work. Within six weeks, I was watching my VPs burn out trying to hit targets I’d anchored the board on.

Here’s what I learned: The problem isn’t AI—it’s treating AI as a general productivity multiplier instead of a task-specific tool.

What Actually Works: Task-Specific Measurement

Stop talking about “AI makes developers X% faster.” Start talking about specific tasks where AI helps:

  • Code review: We see genuine 20-25% speed improvements. Copilot suggestions for test cases, edge case identification.
  • Boilerplate generation: CRUD endpoints, form validation—real gains here, maybe 30-40%.
  • Documentation: AI excels at generating initial drafts from code comments.

But AI slows down complex architecture decisions, debugging subtle integration issues, and understanding legacy systems. When you blend the gains and losses into one number, you get misleading averages that drive bad decisions.

The Framework That Got Through

I started presenting a “Task Decomposition Matrix” to leadership:

Task Type % of Dev Time AI Impact Net Change
Architecture/Design 25% -10% (slower) -2.5%
Writing New Code 20% +35% +7%
Debugging 20% -15% (slower) -3%
Code Review 15% +25% +3.75%
Integration Work 20% 0% (neutral) 0%
TOTAL 100% +5.25%

This shows leadership:

  1. AI impact varies wildly by task type
  2. Some tasks get slower (debugging AI-generated code)
  3. Net realistic gain is ~5%, not 30-40%

The Change Management Investment Gap

Here’s a stat that should concern every exec: Organizations investing in change management are 1.6x more likely to exceed their AI initiative expectations. Yet only 37% of organizations actually invest heavily in it.

We’re spending millions on AI tools but nothing on helping teams adapt workflows, learn when to accept/reject suggestions, or develop new code review practices for AI-generated code. That’s organizational malpractice.

What I’d Recommend

  1. Run a 2-week measurement sprint: Track AI impact by task type, not overall velocity
  2. Present task-specific data: Show where AI helps and where it hurts
  3. Propose realistic targets: If you find 5-8% net gains, anchor expectations there
  4. Request change management budget: Training, workflow redesign, tooling optimization
  5. Establish “AI health metrics”: Track suggestion acceptance rates, time-to-debug AI code, developer sentiment

And critically: Frame this as optimization, not pushback. Execs respond to “We can get more value from our AI investment by setting realistic targets” better than “You’re wrong about productivity gains.”

The goal isn’t to prove AI doesn’t work—it’s to show that blanket multipliers are the wrong mental model. AI is a specialized tool that requires workflow changes and realistic expectations to deliver value.


For context: We’re now seeing sustained 6-8% productivity gains after recalibrating expectations and investing in change management. The teams are healthier, the board is satisfied, and we’re actually getting value from the tools instead of chasing phantom gains.

Luis, this hits home. And Michelle’s framework is excellent—I’m stealing that Task Decomposition Matrix for my next exec meeting.

I want to add the organizational health perspective because I’ve seen AI scope creep destroy teams even when the productivity numbers eventually stabilized.

The Data That Made Leadership Listen

When my VPs came to me last quarter saying the team was drowning, I knew raw productivity metrics wouldn’t resonate with our CEO. So I reframed it around quality and sustainability, using data leadership actually cared about:

What I showed the exec team:

  1. Code Churn Doubled: Despite “shipping faster,” we were rewriting AI-generated code at twice the rate. Each feature required 1.8x more commits to stabilize.

  2. Stability Dropped 7.2%: Incident rate increased. AI helped us ship more, but not better. Our SLA compliance dropped from 99.2% to 92%.

  3. 66% Cite Inaccurate Suggestions: Ran an anonymous survey. Developers’ #1 frustration was “AI code looks right but fails in production.” Not “the tool is slow” or “I don’t like it”—they were spending cognitive load on trust calibration.

  4. Real Utilization at 22%: Even though we paid for seat licenses, actual usage dropped below a quarter of the team within 30 days. That’s a financial waste story execs understand.

When I framed it as “We’re paying for a tool most of the team has abandoned because we didn’t give them space to learn it properly,” that resonated. It wasn’t “the team can’t handle the work”—it was “we set them up to fail.”

The Weekly “AI Reality Check” That Actually Works

We implemented a standing 30-minute weekly meeting with my engineering directors and the CPO. Format:

  • What worked this week: Share 2-3 examples where AI genuinely accelerated work
  • What didn’t: Share 2-3 examples where AI created extra work
  • Utilization snapshot: Quick dashboard check—are people actually using the tools?
  • One decision: Adjust one expectation, practice, or goal based on reality

This keeps leadership grounded in actual team experience rather than vendor marketing. The key is regularity—once you stop, the gap reopens.

The Servant Leadership Angle

Here’s what I told my CEO when he pushed back on “lowering sprint commitments”:

“My job as VP of Engineering is to protect the team’s capacity to do excellent work sustainably. If I let scope creep continue, we’ll hit these inflated targets for 2-3 months, then lose our best people to burnout and have to rebuild the team. That sets us back 6-9 months, not 2 weeks of reduced velocity.”

Framing it as long-term strategic protection instead of “the team is complaining” changed the conversation. Execs understand retention risk and hiring costs.

What I’d Add to Michelle’s Recommendations

Michelle’s Task Decomposition Matrix is the what. Here’s the how for actually implementing it:

  1. Shadow a sprint: Invite your VP of Product to literally sit with engineers for a week. Watch them accept/reject AI suggestions. Watch them debug AI code. The lived experience beats any spreadsheet.

  2. Track “AI Assist Accept Rate” as a metric: If developers are accepting <30% of suggestions, that’s a signal the tool isn’t helping for that task type. Investigate why.

  3. Make it safe to say “AI isn’t helping here”: Create explicit permission to turn off Copilot for certain tasks. Some of our best architects disable it during system design because it’s cognitively distracting.

  4. Celebrate realistic wins: When we hit a 6% productivity gain on boilerplate work, we celebrated it. Reset the team’s mindset that small, sustainable gains are the goal—not phantom 40% leaps.

The Optimization Framing Michelle Mentioned

I love how Michelle framed this as optimization, not pushback. Here’s my version:

“We invested significant capital in AI tooling. To maximize ROI, we need to deploy it strategically on high-impact tasks rather than expecting blanket productivity gains. This adjustment will increase tool utilization and team sustainability.”

That’s executive language for “stop asking for 40% more work.”


Bottom line: Your team’s burnout isn’t a failure of AI tools. It’s a failure of change management and expectation setting. Protect your people, reframe the conversation around sustainable value, and show leadership the real data—not the marketing deck.

This thread is validating in a way I didn’t expect. Hearing engineering leaders acknowledge these problems makes me feel less like I’m “not adapting fast enough.”

The Ground Truth from the Trenches

I’m the senior engineer Luis mentioned who said I feel like a “janitor for AI-generated code.” Let me give you the actual lived experience of what AI scope creep looks like at the IC level.

Before Copilot (December 2025):

  • Assigned 8 story points per sprint
  • Completed 7-8 consistently
  • Spent my time: 30% design, 25% coding, 25% code review, 20% debugging

After Copilot + Scope Increase (January 2026):

  • Assigned 11 story points per sprint (38% increase)
  • Completing 6-7 (missing commitments for first time in 2 years)
  • Spending my time: 15% design, 20% coding, 15% code review, 50% debugging AI suggestions

The math doesn’t work. Yes, I write code faster. But I’m context-switching constantly between “should I accept this suggestion?” and “wait, why did it suggest this?” My cognitive load went up, not down.

The Perception Gap is Real

Here’s the thing that bothers me most: Management thinks we’re faster. We think we’re slower. Both beliefs are affecting our work.

I feel productive when Copilot auto-completes a function. But when I actually track my time (I started logging it), the time saved writing gets consumed by:

  1. Evaluating suggestions: Is this safe? Does it match our patterns? Will it pass code review?
  2. Debugging “almost right” code: The suggestion compiles but fails edge cases. Costs 2x more to debug than writing from scratch.
  3. Explaining AI-generated code in PR reviews: My reviewers ask “why this approach?” and I have to reverse-engineer Copilot’s reasoning.

The METR study Luis cited—where devs were 19% slower but thought they were 20% faster? That’s me. I’m that statistic.

What I Wish Leadership Understood

When you increase sprint commitments based on “AI productivity gains,” here’s what actually happens to me:

  • I feel like I’m failing: I’m working harder than ever, missing deadlines for the first time, and wondering if I’m the problem.
  • I’m scared to say “AI isn’t helping”: If everyone else is supposedly 30% faster and I’m not, that makes me look like the weak link.
  • The quality bar drops: When I’m behind, I start accepting “good enough” AI suggestions instead of writing the right solution. Technical debt compounds.

Michelle and Keisha—your frameworks are exactly what I wish my leadership would implement. But I don’t have the authority to make those changes. I need my director to advocate for me.

A Request for Engineering Leaders

Luis asked: “How do you get leadership to listen to IC concerns?”

Here’s what would make a difference to me:

  1. Shadow a sprint: Keisha mentioned this—please actually do it. Watch me work for a week. See how many Copilot suggestions I reject and why.

  2. Ask about AI in 1-on-1s: Not “is Copilot helping you?” (I’ll say yes because I don’t want to look resistant to change). Ask “What % of Copilot suggestions do you accept? For which tasks is it helpful vs. distracting?”

  3. Track “AI assist accept rate”: Keisha’s metric is gold. In my case, I accept maybe 25% of suggestions. That’s a signal the tool and my work aren’t well-matched.

  4. Create psychological safety to opt-out: I need explicit permission to turn off Copilot when it’s counterproductive. Right now, I feel pressured to use it because “the company invested in it.”

  5. Celebrate realistic gains: If we see a genuine 5-8% improvement in specific areas, that’s a win. Don’t inflate it to 30% to justify scope increases.

The “Janitor” Feeling

Someone asked me to elaborate on the “cleaning up AI messes” comment.

It’s this: I became a software engineer because I love solving problems and building things. AI was supposed to handle the boring parts so I could focus on the interesting challenges.

Instead, I’m spending half my time evaluating, debugging, and rewriting AI-generated code. The boring parts (boilerplate, tests) got a little faster. But my actual engineering work—architecture, design, complex problem-solving—got slower and more fragmented.

I don’t feel like a builder anymore. I feel like a code reviewer for a junior dev who writes plausible-looking but subtly wrong solutions.

That’s unsustainable. And when you layer unrealistic sprint expectations on top of it, burnout follows.


Luis: I don’t have a silver bullet for how to educate your VP. But I can tell you that if my director came to me with Michelle’s Task Decomposition Matrix and Keisha’s AI Reality Check framework, and said “We’re resetting expectations to protect the team,” I would be incredibly relieved.

Your team needs you to advocate for them. We can’t do it ourselves without risking our credibility. That’s on leadership.

Adding a security angle to this discussion because I think it’s being underweighted in the “AI productivity” conversation.

Security Debt is Accumulating Faster Than We Can Audit

Luis, you mentioned your team’s burnout. Let me tell you what’s happening in my world:

Our security review backlog tripled after Copilot adoption.

Not because developers are malicious. Because AI-generated code introduces vulnerabilities at a pace our security review process wasn’t designed to handle.

The Problem: Volume vs. Quality Trade-off

When your team is shipping 30% more features (because leadership expects it), that’s 30% more attack surface. But our security capacity didn’t increase 30%. We’re still the same 4-person team trying to review:

  • More code
  • Written faster
  • With patterns we didn’t design
  • That developers sometimes don’t fully understand (because AI wrote it)

Example from last sprint:

A developer used Copilot to generate an API authentication helper. The code looked fine. It passed code review. It made it to production.

Two weeks later, I’m doing a routine audit and discover it’s vulnerable to timing attacks. The AI generated a string comparison using == instead of a constant-time comparison function.

The developer didn’t catch it because they trusted the suggestion. The code reviewer didn’t catch it because it looked reasonable. I caught it because security review is my job—but I only audit 15% of changes due to volume.

How many vulnerabilities are we NOT catching?

The Metrics Leadership Doesn’t See

Michelle’s Task Decomposition Matrix is brilliant. Here’s the security version I wish every CTO would review:

Security Impact Before AI After AI Change
Features shipped/sprint 12 16 +33%
Security reviews completed 12 11 -8% (capacity)
% of code audited 100% 69% -31%
CVEs introduced 2-3/quarter 7-8/quarter +150%
Time to patch CVEs 5 days 12 days +140% (backlog)

Leadership sees “16 features shipped.” I see “31% of code unaudited and CVE rate doubled.”

The “AI Made It So It Must Be Safe” Fallacy

Alex’s “janitor” comment resonates. From a security lens, I’m seeing developers treat AI suggestions with less scrutiny than they’d apply to their own code.

There’s an implicit trust: “The AI has seen millions of codebases. Surely it knows the secure pattern.”

But Copilot doesn’t have a threat model. It doesn’t understand your security requirements. It optimizes for “code that looks like other code,” not “code that’s secure for this specific context.”

Specific vulnerabilities I’ve caught in AI-generated code:

  • SQL injection from improperly parameterized queries (classic)
  • XSS from unsanitized template rendering
  • Race conditions in async operations
  • Hardcoded credentials in config files (!!!)
  • Insecure randomness for session tokens
  • Missing authorization checks in API endpoints

None of these were malicious. All of them were “code that compiles and looks reasonable.”

What Got Leadership Buy-In

Keisha asked how to frame this. Here’s what worked for me:

I framed it as risk management, not “security being difficult”:

“We’re introducing technical debt at 3x our previous rate. Our security review capacity is fixed. This creates a growing risk delta that increases our exposure to breaches, compliance failures, and reputational damage. We need to either slow feature velocity or increase security capacity proportionally.”

Then I showed the CVE trend graph. That’s executive language.

Recommendations for Engineering Leaders

If you’re experiencing AI-driven scope creep, please include security metrics in your conversations with leadership:

  1. Track CVE introduction rate: If it’s increasing post-AI, that’s a red flag.

  2. Measure security review coverage: What % of code changes get security audit? If it’s dropping, you’re accumulating risk.

  3. Include security in your “AI Reality Check”: Keisha’s weekly meeting format should include “security incidents this week” as a standing agenda item.

  4. Request security capacity scaling: If engineering is growing 30%, security needs to grow 30%. We can’t audit more code with the same team size.

  5. Implement automated security checks: If AI is generating code, use AI to scan for common vulnerabilities. We added SonarQube with security rules—it catches ~40% of AI-introduced issues.

The Long-Term Risk

Here’s my fear: We’re optimizing for shipping fast at the expense of shipping securely.

In 12 months, when a breach happens, leadership will ask “why didn’t security catch this?” The answer will be: “We told you we couldn’t review 31% of the codebase due to increased velocity, and you chose speed over coverage.”

I don’t want to be the person saying “I told you so” after a data breach. I’d rather be the person who helped leadership understand the trade-offs before we accumulate insurmountable security debt.


Luis, Michelle, Keisha: Your frameworks for resetting expectations are excellent. I’d add one more item to the metrics you track: Security coverage and CVE trends. Because if we ship 40% faster but create 150% more vulnerabilities, that’s not productivity—that’s risk accumulation.