Code Review Is Now the Bottleneck: AI Writes Fast, Humans Can't Review Fast Enough

We’ve been tracking cycle time metrics for 3 months since rolling out AI coding agents. Here’s the uncomfortable pattern that emerged:

Time to first implementation: Down 58% :white_check_mark:
Time to code review complete: Down 12% :warning:
Overall cycle time (story → production): Down 18% :chart_decreasing:

AI made our developers write code way faster. But code review became the bottleneck.

The Review Queue Crisis

Our senior engineers are drowning. Before AI agents:

  • ~25% of their time spent on code reviews
  • Average PR review time: 4-6 hours
  • Review queue depth: 5-8 PRs

After AI agents:

  • ~45% of their time spent on code reviews
  • Average PR review time: 6-9 hours (more code to review per PR)
  • Review queue depth: 12-18 PRs

We optimized for the wrong constraint. Code generation was never the bottleneck—human review always was. AI just made it worse.

Why AI-Generated Code Takes Longer to Review

At first, I assumed: “AI code is cleaner, should be faster to review.”

Wrong. Here’s what our senior engineers report:

1. Volume Is Higher

AI generates more code per feature than humans would write. Not because it’s worse—because it’s thorough.

Example: Human engineer implements feature → 200 lines of code, 3 test cases.
AI agent implements same feature → 350 lines of code, 12 test cases, comprehensive edge case handling.

More code = more review time, even if quality is high.

2. Intent Is Less Clear

When reviewing human code, you can often infer intent from structure. Experienced engineers develop patterns, and reviewers recognize them.

AI code doesn’t have consistent “voice.” Each generation is clean, follows best practices, but the why behind architectural choices isn’t always obvious.

Reviewers spend extra time asking: “Why this approach vs alternatives? What assumptions were made?”

3. Context Switching Is Harder

Human PRs have commit messages, branch names, ticket context that tell a story. AI-generated PRs are… comprehensive dumps.

One senior engineer described it: “Human PRs are like reading a novel. AI PRs are like reading an encyclopedia. Both can be well-written, but one requires more cognitive effort.”

4. “Trust But Verify” Takes Time

Even when AI code looks correct, reviewers feel obligated to verify thoroughly. Because mistakes in AI-generated code can be subtle—not syntax errors, but logic errors or architectural mismatches.

So review becomes more methodical, less skimmable.

The Business Impact

From a product perspective, this is concerning:

What we expected:
“AI makes developers 2x faster → we ship features 2x faster → we hit roadmap goals earlier”

What actually happened:
“AI makes developers 2x faster at writing code → review becomes bottleneck → we ship features 20% faster → incremental improvement, not transformation”

And the hidden cost: Senior engineers are burning out from constant review load.

Solutions We’re Exploring

1. AI-Assisted Code Review (Meta-Agents)

The idea: Use AI to pre-review AI-generated code. Flag issues before human review.

We’re experimenting with review agents that check:

  • Security vulnerabilities - SQL injection, XSS, auth bypasses
  • Performance anti-patterns - N+1 queries, memory leaks, inefficient algorithms
  • Consistency violations - Deviates from codebase patterns or style guides
  • Test coverage gaps - Missing edge cases or error handling tests

Human reviewers then focus on:

  • Architectural fit
  • Business logic correctness
  • Trade-off evaluation
  • Strategic direction

Early results: Reduces human review time by ~30%. Not amazing, but meaningful.

The downside: We’re now trusting one AI to review another AI’s work. What if both make the same mistake? We’re still figuring out the trust model.

2. Tiered Review Process Based on Risk

Not everything needs the same review rigor. We’re implementing:

Tier 1 - Light review (automated + spot check):

  • Tests, documentation, refactoring
  • Low business impact
  • Automated scanners + quick senior engineer glance

Tier 2 - Standard review (automated + full review):

  • Feature implementation, bug fixes
  • Moderate business impact
  • Automated scanners + thorough senior review

Tier 3 - Deep review (automated + architectural review):

  • Security-critical, compliance-sensitive, high-traffic paths
  • High business impact
  • Automated scanners + senior review + architectural review + domain expert review

This lets us allocate review capacity where it matters most.

3. Improving Agent Output Documentation

We’re training agents to include more context in PRs:

Before:

PR: Implement user authentication
Files changed: 12
Lines changed: +450 -30

After:

PR: Implement user authentication

## Approach
Chose JWT-based auth over session-based because:
- Stateless (scales better)
- Mobile-friendly (no cookie issues)
- Aligns with existing API gateway pattern

## Alternatives Considered
- Session-based: Simpler but doesn't scale
- OAuth 2.0: Over-engineered for our use case

## Security Considerations
- Tokens expire in 1 hour (balances security/UX)
- Refresh token pattern implemented
- Rate limiting on auth endpoints (100 req/min)

## Testing Strategy
- Unit tests: Token generation/validation logic
- Integration tests: Full auth flow
- Security tests: SQL injection, XSS attempts

This context helps reviewers understand why not just what, reducing “time to understand.”

4. Scheduled Review Windows

Instead of constant interruptions, we’re trying focused review time:

  • Daily “review hours”: 10-11am, 2-3pm - dedicated review time
  • Rest of day: Deep work, no review expectations
  • Async reviews: Non-urgent PRs reviewed within 24 hours

Reduces context switching for reviewers. Unclear yet if this helps junior engineers waiting for reviews.

The Uncomfortable Question

Here’s what I asked our CTO last week:

“If code review is the bottleneck, and AI can’t fix it… do we need to hire more senior engineers just to review AI-generated code?”

That would be ironic: AI was supposed to reduce headcount needs, but instead it creates demand for more expensive senior reviewers.

Her response: “Maybe. Or maybe we fundamentally rethink what ‘review’ means in an AI-augmented workflow.”

That’s the conversation we need to have as an industry.

The Meta-Pattern

This is exactly what happened with test automation. When we automated testing, we didn’t eliminate QA—we shifted their role from manual testers to test automation engineers and quality architects.

Same pattern here: We’re not eliminating code review—we’re shifting it from “review every line” to “review architecture and validate automation.”

But we haven’t figured out the new process yet.

What If We’re Thinking About This Wrong?

Maybe the question isn’t “How do we speed up review of AI-generated code?”

Maybe it’s: “How do we reduce the need for review in the first place?”

Options:

  • Better constraints upfront - More specific requirements reduce agent mistakes
  • Stronger automated validation - Catch issues before human review
  • Graduated trust - Agents earn autonomy by proving reliability on low-risk tasks
  • Domain-specific agents - Specialized agents that deeply understand our codebase patterns

If agents can generate code that’s provably correct (tests pass, security scans pass, performance benchmarks pass, matches architectural patterns), maybe human review becomes a spot-check rather than a deep dive.

But we’re not there yet.

The Reality Check

Here’s where we are: AI made coding faster, revealed that review was always the constraint, and we don’t have a scalable solution yet.

This isn’t failure. It’s learning. We’re in the messy middle of a technology transition.

But product leaders need to set realistic expectations: We’re not 3x faster. We’re 20% faster, with different bottlenecks, and we’re still figuring out the new workflow.


Questions for the community:

  1. How are you handling code review at scale with AI-generated code?
  2. Have you tried AI-assisted review? What worked or didn’t?
  3. What does “good enough” review look like when volume is high?
  4. Is this a temporary bottleneck or a fundamental limit?

David, this is exactly what we’re experiencing. And we’re already at your “solution 1” stage: using AI to review AI.

Our AI-Assisted Review Experiment

We’ve been running this for 6 weeks. Here’s the setup:

Stage 1 - Agent generates code
Standard agentic coding workflow.

Stage 2 - Review agent scans
Before human review, a separate AI agent checks:

  • Security vulnerabilities (OWASP Top 10)
  • Performance issues (complexity analysis, database query patterns)
  • Compliance violations (our financial services rules)
  • Code consistency (matches our internal patterns)
  • Test adequacy (coverage, edge cases)

Stage 3 - Human reviews
Senior engineer focuses on:

  • Architectural decisions
  • Business logic correctness
  • Strategic fit

What’s Working

Time savings: Senior engineer review time dropped from 7 hours/week to 4.5 hours/week per engineer. That’s ~30% reduction, matching your early results.

Quality improvements: Review agent catches things humans miss:

  • Consistently checks every database query for N+1 patterns
  • Never gets tired or distracted
  • Applies rules uniformly (no “good enough for Friday afternoon” syndrome)

Documentation benefits: Review agent generates detailed reports:

Security Scan Results:
✅ No SQL injection vulnerabilities detected
✅ All user inputs sanitized  
⚠️  API endpoint lacks rate limiting (recommendation: 100 req/min)
✅ Authentication checks present on all protected routes

Performance Analysis:
✅ Database queries use indices appropriately
⚠️  Function `calculateTotals()` has O(n²) complexity - consider optimization
✅ No obvious memory leaks detected

Compliance Check:
✅ Audit logging present for all financial transactions
❌ Missing encryption for customer SSN field (GLBA violation)
⚠️  PII data in logs on line 47 (recommendation: redact)

This gives human reviewers a focused checklist instead of open-ended “review everything.”

What’s Not Working (The Trust Problem)

Your concern: “What if both agents make the same mistake?”

This is real. We’ve caught 3 cases where:

  1. Coding agent introduced a subtle bug
  2. Review agent missed the bug (or validated incorrect behavior)
  3. Only caught in human review (or worse, in staging)

Example:

  • Coding agent implemented currency conversion logic with rounding error
  • Review agent validated: “Tests pass, logic looks correct”
  • Human caught: “Wait, this loses precision for large transactions—banking regulations require exact decimal handling”

The review agent understood code correctness but not domain requirements.

The Solution: Domain-Specific Review Agents

We’re now training specialized review agents with:

  • Our compliance rules explicitly documented
  • Examples of past bugs we caught in review
  • Domain-specific checklists (banking regulations, audit requirements)
  • Architectural patterns specific to our codebase

Early results are better, but it’s a lot of setup investment.

Your Tiered Review Process - We’re Doing This Too

Almost identical to your framework:

Tier 1 - Auto-approve (automated only):

  • Dependency updates (within security policies)
  • Documentation changes
  • Code formatting
  • Test coverage improvements

We’ve auto-approved 47 PRs in 6 weeks with zero issues. These never hit human review queue.

Tier 2 - Light review (automated + quick check):

  • Bug fixes (isolated to single module)
  • Refactoring (test coverage confirms behavior unchanged)
  • Non-critical features

Human review focuses on: “Does this make sense?” not “Is every line correct?”

Tier 3 - Full review (automated + deep human review):

  • New customer-facing features
  • Security/compliance-sensitive changes
  • Database schema modifications
  • Integration with external systems

These still take 3-5 hours of senior engineer time. Can’t shortcut these.

The Headcount Question You Raised

“Do we need to hire more senior engineers just to review AI code?”

We’re facing this. Our hiring plan for 2026 originally included:

  • 10 new junior/mid engineers
  • 2 new senior engineers

After 6 months of AI-augmented development, we’re revising to:

  • 6 new junior/mid engineers (AI reduces need for pure code writers)
  • 5 new senior engineers (increased need for reviewers and architects)

So yes, AI is shifting our hiring mix toward more expensive, more experienced engineers.

The business case: 6 engineers + AI agents produce more code than 10 engineers without AI. But that code needs senior review.

Net headcount is down (from 12 to 11), but cost might be flat or up (seniors are more expensive).

The “Reduce Need for Review” Approach

Your last question: “How do we reduce need for review in the first place?”

This is the right question. We’re experimenting with:

1. Better Constraints Upfront

Instead of: “Implement user authentication”

We now provide:

Implement JWT-based user authentication

Constraints:
- Use existing `AuthService` pattern (see auth/base.ts)
- Token expiry: 1 hour (security requirement)
- Must integrate with audit logging system
- Rate limiting: 100 requests/min per IP
- Must pass security scan before review

Examples:
- See previous implementation in admin module (PR #1234)
- Follow error handling pattern from payment module

Compliance requirements:
- SOX: Log all authentication attempts
- GLBA: Encrypt tokens at rest

This reduces agent mistakes significantly. But it’s more upfront work for product/engineering.

2. Graduated Trust Based on Track Record

We track agent success rates:

  • Test generation: 95% first-time-right → auto-approve
  • Documentation: 92% first-time-right → light review
  • Feature implementation: 73% first-time-right → full review
  • Database migrations: 41% first-time-right → always human-led

Agents earn autonomy in areas where they’ve proven reliable.

3. Agent-Generated Test Evidence

Instead of humans manually verifying functionality, agents provide:

  • Test coverage report (95%+ required for auto-review)
  • Performance benchmarks (vs baseline)
  • Security scan results (no critical issues)
  • Integration test results (all green)

If all evidence passes, human review is lighter.

Michelle’s Point About DevOps

The comparison to infrastructure-as-code is spot-on. We didn’t eliminate infrastructure engineers—we shifted them from “manually configure servers” to “review Terraform plans and validate automation.”

Same here. We’re not eliminating code review—we’re shifting from “line-by-line review” to “architectural validation and strategic fit.”

But the transition is messy and we’re under-resourced for this new model.

The Uncomfortable Reality

After 6 months of AI-augmented development, here’s my assessment:

Wins:

  • Faster code generation (58% faster)
  • Better test coverage (AI is thorough)
  • Tech debt backlog clearing
  • Junior engineers more productive

Losses:

  • Senior engineers overloaded
  • Review is now the bottleneck
  • Hiring costs shifting up (need more seniors)
  • New complexity (agent tooling, review processes)

Net: Modest productivity gain (18% cycle time improvement) with significant organizational change.

Not the transformation we hoped for, but meaningful progress.


Question for teams further along: Did the review bottleneck eventually resolve? Does agent quality improve over time to need less review? Or is this the new steady state?

David and Michelle, the review bottleneck is hitting us hard too. But I’m more concerned about the team health impact nobody’s discussing.

Senior Engineers Are Burning Out

Let me share real data from our quarterly engagement surveys:

Q4 2025 (before AI agents):

  • Senior engineer satisfaction: 7.8/10
  • Reported “sustainable workload”: 72%
  • Top frustration: “Too much time on boring boilerplate”

Q1 2026 (after AI agents):

  • Senior engineer satisfaction: 6.4/10 :down_arrow:
  • Reported “sustainable workload”: 51% :down_arrow:
  • Top frustration: “Drowning in code review”

We solved the boilerplate problem. But created a worse problem: senior engineers feeling like review machines.

Anonymous feedback from recent 1-on-1s:

  • “I barely write code anymore. I just review AI output all day.”
  • “I miss the creative part of engineering. Now I’m just quality control.”
  • “The juniors are shipping features. I’m stuck reviewing their AI agent’s work. This isn’t what I signed up for.”
  • “I’m considering leaving. I didn’t become a senior engineer to be an AI code reviewer.”

This is a retention crisis waiting to happen.

The Status vs Fulfillment Problem

Senior engineers derive satisfaction from:

  • Solving hard technical problems
  • Architecting systems
  • Mentoring junior engineers
  • Shipping impactful work

But when 45% of their time is code review (Michelle’s number), they’re not doing those things. They’re checking someone else’s (or something else’s) work.

That’s not fulfilling. And it’s not what they were promoted for.

The Alternative: Rethink Review, Not Just Speed It Up

Michelle’s AI-assisted review helps with speed. But doesn’t address the underlying issue: senior engineers don’t want to be full-time reviewers.

What if we fundamentally changed the model?

Proposal: “Architecture First, Review Later”

Current model:

  1. Junior + agent implement feature
  2. Senior reviews implementation
  3. Iterate until acceptable

Alternative model:

  1. Senior defines architecture and constraints
  2. Junior + agent implement within constraints
  3. Automated validation ensures compliance
  4. Senior spot-checks, doesn’t deep-review

This shifts senior work from reactive review to proactive architecture.

More creatively fulfilling. Less time as quality gate.

Michelle’s Graduated Trust Is Key

The tiered auto-approval approach addresses this. If agents can earn trust in certain areas, seniors don’t review everything.

But we need to be more aggressive about it. Our current thresholds:

  • 95% success rate → auto-approve

That’s conservative. What if we said:

  • 85% success rate → auto-approve, with spot-checking
  • Cost of mistakes is low (easy to fix, low blast radius)
  • Automated tests catch regressions

Yes, we’ll ship occasional bugs. But we might retain senior engineers who are happier and more engaged.

The “Review Windows” Approach We’re Trying

David mentioned scheduled review times. We’re doing similar:

Old approach: Reviews trickle in constantly, seniors context-switch all day

New approach:

  • PRs submitted by 10am → reviewed by 2pm (same day)
  • PRs submitted after 10am → reviewed next morning
  • Urgent PRs (production hotfixes) → reviewed immediately

Benefits:

  • Seniors have 4-hour blocks for deep work
  • Juniors know when to expect feedback
  • Reduces constant interruptions

Downsides:

  • Juniors wait longer for non-urgent reviews
  • Requires discipline (not always checking PR queue)

Still evaluating if this helps long-term.

The Mentorship Time Trade-Off

Michelle mentioned 12-15 hours/week mentoring juniors in AI-augmented work (first 3 months).

Here’s the tension: If seniors spend more time mentoring, they spend less time reviewing. But volume keeps growing.

We can’t have both:

  • Deep mentorship on how to work with AI effectively
  • Fast turnaround on review queue

Something has to give.

Options:

  1. Accept slower reviews (impacts velocity)
  2. Reduce mentorship depth (impacts junior development)
  3. Hire more seniors (expensive, slow)
  4. Accept lower review quality (risky)

We chose #1 (slower reviews) for now. But product team is frustrated with longer cycle times.

The Headcount/Role Shift Michelle Described

We’re seeing the same pattern:

  • Need fewer junior “code writers”
  • Need more senior “architects/reviewers”

But this creates problems:

  • Career path unclear - If juniors aren’t writing much code, how do they become seniors?
  • Knowledge gaps - Seniors need deep implementation knowledge to architect well, but spend less time implementing
  • Hiring challenges - Easier to hire juniors than seniors, but we need seniors

Organizational structure is shifting faster than our hiring/development processes can adapt.

What Would Actually Help

Not just tools, but process and culture changes:

1. Redefine “Senior Engineer” Role

From: “Writes complex code + reviews + mentors”
To: “Architects systems + sets constraints + validates outcomes”

Less hands-on implementation, more strategic direction.

2. Create “Code Review Specialist” Role

Some engineers like review work. It’s detail-oriented, requires deep knowledge, catches issues.

What if we had a dedicated role: senior-level engineers who specialize in review, security validation, quality assurance?

Like how SRE became its own discipline separate from traditional operations.

3. AI Review Quality Metrics

If we’re using AI to review AI, we need trust metrics:

  • How often does review agent miss issues that humans catch?
  • Which types of issues does it miss most?
  • When can we trust it vs when do we need human oversight?

4. Clearer Boundaries on “Good Enough”

David’s question: “What does ‘good enough’ review look like?”

We need explicit standards:

  • Tier 1 code: Automated checks sufficient
  • Tier 2 code: Automated + spot-check (sample 20% of changes)
  • Tier 3 code: Full human review

Stop pretending we can deeply review everything. Be intentional about trade-offs.

The Question We’re Avoiding

Is the problem that we’re generating too much code?

AI agents are thorough. They implement features plus comprehensive tests plus edge case handling plus documentation.

That’s good quality. But it’s also a lot of stuff to review.

What if we asked agents to be more minimal?

  • Implement the feature, not every possible variant
  • Core tests, not exhaustive coverage
  • Just enough documentation, not complete treatises

Less code → less review burden → faster iteration.

But that requires changing how we prompt agents. And trusting that “good enough” is actually good enough.

The Real Conversation

David’s right: This is about organizational change, not just tooling.

We need to:

  • Redefine engineering roles
  • Set realistic expectations about review capacity
  • Accept that not everything gets deep review
  • Invest in AI review tooling that actually works
  • Prioritize senior engineer happiness alongside productivity

Otherwise, we’ll see senior engineer attrition. And that’s expensive—both in hiring costs and lost institutional knowledge.


Question for engineering leaders: How are you handling senior engineer burnout from review overload? What’s your strategy for sustainable code review at scale?

This entire thread is exactly what we went through with financial compliance—and it nearly derailed our AI adoption. Let me share what actually worked for us.

We Hit the Review Wall Hard

Three months into AI-augmented development, our review queue looked like:

  • 23 open PRs
  • Average wait time for review: 4.2 days
  • Senior engineers working evenings/weekends to catch up
  • Junior engineers blocked, velocity dropping despite AI help

Something had to change. We couldn’t just “work harder” our way out.

What Worked: Extreme Automation + Clear Ownership

Michelle’s AI-assisted review is the right direction. But we went further: We automated 80% of review, redefined what humans review.

Our Review Stack

Layer 1 - Automated Security & Compliance (No Human Review)

  • SQL injection scanning (SonarQube)
  • PII detection (custom regex + ML model)
  • Encryption validation (all customer data must be encrypted)
  • Audit logging verification (SOX compliance)
  • Dependency vulnerability scanning (Snyk)

If Layer 1 fails → Auto-reject PR, agent must fix

No human time wasted on these. Agents iterate until automated checks pass.

Layer 2 - Automated Pattern Matching (No Human Review)

  • Code style consistency (ESLint + custom rules)
  • Architecture pattern compliance (“does this follow our service structure?”)
  • Performance regression checks (benchmarks vs baseline)
  • Test coverage thresholds (85%+ required)

If Layer 2 fails → Auto-reject or auto-fix

Layer 3 - AI Review Agent (No Human Review Unless Flagged)
Michelle described this. We use it for:

  • Logic consistency checks
  • Edge case validation
  • Integration testing adequacy
  • Documentation completeness

If issues found → Flags for human review with specific questions

Layer 4 - Human Review (ONLY Architectural & Business Logic)
Senior engineers review:

  • “Does this solve the right problem?”
  • “Is the architecture sound?”
  • “Are trade-offs appropriate?”
  • “Does this fit our long-term direction?”

They don’t review:

  • Code style (automated)
  • Security basics (automated)
  • Test coverage (automated)
  • Logic correctness (AI agent reviewed)

Result: Senior Review Time Dropped 68%

From: 7 hours/week per senior
To: 2.2 hours/week per senior

They’re reviewing maybe 1-2 PRs per day deeply, instead of 5-6 PRs superficially.

Keisha’s Burnout Point Is Real (We Lost 2 Seniors)

Last quarter, we lost two excellent senior engineers. Exit interviews made it clear:

“I didn’t sign up to be a code reviewer. I want to build things.”

That was a wake-up call. We were treating seniors as quality gates, not architects.

Our “Architecture First” Workflow (Keisha’s Proposal)

We now do exactly what Keisha suggested:

Monday: Architecture Planning

  • Product defines feature requirements
  • Senior engineer defines architecture + constraints
  • Documents: “Here’s the approach, here’s what NOT to do, here’s why”

Tuesday-Thursday: Implementation

  • Junior + agent implement within constraints
  • Automated checks validate compliance
  • AI review agent provides feedback
  • Junior iterates based on automation feedback

Friday: Validation Review

  • Senior engineer: “Does the implementation match my architectural intent?”
  • Not: “Is every line correct?”
  • Instead: “Did we solve the right problem the right way?”

This takes 1-2 hours vs 5-7 hours of line-by-line review.

The “Minimize Code Volume” Insight

Keisha asked: “Are we generating too much code?”

YES. We explicitly prompt agents now:

Old prompt:
“Implement user registration with email verification”

New prompt:
“Implement user registration with email verification. Minimize code. Only implement the core flow. Don’t add features we didn’t ask for. Don’t over-engineer.”

This reduced code volume by ~30% without sacrificing functionality. Just less “the agent thought this might be useful” code.

The “Good Enough” Standards David Asked About

We have explicit, documented standards:

Tier 1 (Auto-approve if automation passes):

  • Dependencies updates
  • Test additions (no code changes)
  • Documentation
  • Refactoring within isolated modules

Tier 2 (Light human review - 15 min):

  • Bug fixes (single module)
  • Small features (< 200 lines)
  • UI changes (non-critical paths)

Human checks: “Makes sense? Fits our patterns? Ship it.”

Tier 3 (Full human review - 1-2 hours):

  • New customer-facing features
  • Database schema changes
  • Security-sensitive code
  • High-traffic endpoints
  • Compliance-critical paths

Human checks: Everything.

The key: We’re explicit about “good enough.” Not everything gets white-glove treatment.

We Created a “Quality Engineering” Team

Keisha’s “code review specialist” idea—we did this.

Three senior engineers who wanted review-focused work:

  • Deep security expertise
  • Love finding edge cases
  • Detail-oriented personalities
  • Enjoy building/tuning automation tools

They:

  1. Maintain our automated review tooling
  2. Handle Tier 3 reviews
  3. Train AI review agents on our patterns
  4. Define review standards

Other seniors focus on architecture and mentorship. Much happier.

This isn’t for everyone. But some engineers genuinely like this work. We hired for it.

The Financial Model That Justified This

CFO’s question: “Why are we spending on all this automation tooling?”

Before automation investment:

  • 8 senior engineers × 7 hours/week review = 56 engineer-hours/week
  • Cost: ~$4,200/week (fully loaded)
  • Plus: Opportunity cost of not doing architecture/mentoring work

After automation investment:

  • Tooling cost: ~$15K/month ($3,750/week)
  • 8 senior engineers × 2.2 hours/week review = 17.6 engineer-hours/week
  • Cost: ~$1,300/week
  • Saved: 38.4 hours/week of senior time

Net benefit:

  • Direct cost savings: $1,150/week
  • Plus: Seniors have 38 hours/week for higher-value work (architecture, mentoring, strategic projects)
  • Plus: Avoided 2 senior engineer backfills ($300K+ hiring cost)

ROI was obvious once we quantified it.

What We’re Still Figuring Out

Despite progress, challenges remain:

1. Agent quality varies
Some agents are 95% reliable (test generation). Others are 70% (complex business logic). We’re still learning which tasks to trust.

2. Domain knowledge gaps
Banking regulations change. Agents don’t keep up. We need human expertise for compliance decisions.

3. Career paths
If juniors mostly direct agents vs write code, how do they develop judgment? We’re experimenting with mentorship models.

4. Trust calibration
How much automation is “enough”? We’re conservative (financial services). Startups could probably automate more.

The Honest Take

After 9 months of AI-augmented development + 3 months of extreme automation:

We’re productive: 30% cycle time improvement (better than David’s 18%)

We’re sustainable: Senior engineers not burning out, manageable review load

We’re adapting: Roles changing (architects, quality engineers), hiring evolving

But it required:

  • Significant tooling investment ($150K+ in automation infrastructure)
  • Organizational change (new roles, new processes)
  • Cultural shift (accepting “good enough” review, trusting automation)

Not easy. But necessary to scale AI-augmented development.


The key insight: You can’t just add AI agents and expect everything else to stay the same. The entire development workflow needs to evolve. Review is part of that evolution.