AI-Assisted Code Shows 1.7× More Issues, Yet 93% of Developers Use These Tools. Are We in a Collective Quality Denial?

Three months ago, I watched our security team flag a pull request that looked perfect in review. Clean abstractions, elegant error handling, comprehensive tests. The engineer had used Claude Code for about 60% of it. The problem? It contained two subtle security vulnerabilities and a race condition that only surfaced under load.

This wasn’t an isolated incident. We’re seeing a pattern.

The Uncomfortable Numbers

Recent studies paint a picture that should alarm every technical leader:

  • 1.7× more issues in AI-assisted code compared to human-written code
  • 48-62% of AI-generated code contains security vulnerabilities
  • 4× increase in code duplication since widespread AI adoption
  • 15-18% more security vulnerabilities in AI-generated code
  • 7.2% decrease in delivery stability per the 2024 DORA report

Yet here’s the paradox: 93% of developers now use AI coding tools, up from 41% just last year. Some teams report 75% of their code involves AI assistance.

We’re not talking about a small experiment anymore. This is production infrastructure. Customer data. Financial transactions. Healthcare systems.

The Trust Gap Problem

The most revealing statistic: Only 46% of developers fully trust AI-generated code. That means nearly half the industry is using tools they don’t trust for production systems.

How did we get here? I think it’s a combination of:

  1. Pressure to ship faster - AI promises velocity, and product roadmaps don’t care about quality concerns
  2. Individual productivity theater - Developers feel faster (reporting 24% gains), even when studies show they’re actually 19% slower
  3. Diffused responsibility - “The AI wrote it” becomes a psychological shield
  4. Review process inadequacy - Our code review practices weren’t designed for 4× more duplicated code and subtle logic flaws

What I’m Seeing at Scale

At our company (120 engineers), we’ve tracked this closely over 6 months:

  • Code review time increased 40% - Reviewers spend more time hunting for subtle AI-generated bugs
  • Bug escape rate up 23% - More issues reaching production despite longer reviews
  • Technical debt accumulation - AI generates “almost right” code that works but doesn’t fit our architecture
  • Security scan alerts doubled - Our SAST tools flag more vulnerable patterns

The productivity gains we expected? They evaporated when we measured cycle time from commit to production-ready. We’re generating code faster but shipping features slower.

The Questions I Can’t Stop Asking

Are we in collective denial? The data says AI code is lower quality, yet adoption is nearly universal. This isn’t rational behavior—it’s momentum.

Who’s responsible for AI-generated bugs? When a vulnerability ships to production, is it the developer who accepted the suggestion? The reviewer who approved it? The organization that mandated AI tools for velocity?

What’s the long-term cost? We’re accumulating technical debt at 4× the rate due to code duplication alone. In 3 years, what does this codebase look like?

Can we afford to opt out? If competitors are shipping faster (even if less reliably), do we have the luxury of moving deliberately?

What We’re Trying

We haven’t solved this, but here’s our current approach:

  1. AI-Aware Review Checklist - Specific items for AI-generated code patterns
  2. Security-First Prompting - Training engineers to prompt for security from the start
  3. Mandatory Human Design Docs - No jumping straight to AI implementation
  4. Quality Gates With Teeth - Block deploys when code duplication thresholds are exceeded
  5. Honest Metrics - Measuring actual cycle time, not just code generation speed

But I’ll be honest: it feels like we’re bailing water from a boat while everyone else is drilling more holes because it “improves water flow.”

The Uncomfortable Truth

Maybe the real question isn’t “Are we in denial?” but “What quality standard are we willing to accept in exchange for speed?”

Because right now, we’re making that choice implicitly through adoption, rather than explicitly through strategy.

I’d love to hear from other technical leaders:

  • What are you seeing in your organizations?
  • Have you found ways to get AI productivity without the quality hit?
  • How are you measuring true impact beyond developer perception?
  • At what point does the security risk outweigh the velocity benefit?

This isn’t about being anti-AI. It’s about being honest about the tradeoffs we’re making at scale.

What am I missing here?

Michelle, this hits close to home. I’m managing 40+ engineers across distributed teams, and what you’re describing mirrors our experience almost exactly.

The review burden is real and it’s changing team dynamics in ways I didn’t anticipate.

The Silent Shift in Code Review Culture

Six months ago, code reviews averaged 45 minutes per PR. Today? 72 minutes. But here’s what’s really concerning: the nature of the feedback has fundamentally changed.

Pre-AI reviews focused on:

  • Architecture alignment
  • API design decisions
  • Performance considerations
  • Domain logic correctness

Now our reviews are dominated by:

  • “Is this AI-generated duplication?”
  • “Did you validate this error handling logic?”
  • “Have you tested these edge cases the AI might have missed?”
  • “Does this actually match our design patterns?”

We’re spending more time debugging AI code than discussing system design.

The Junior Engineer Problem You Mentioned

Your point about diffused responsibility is hitting our junior engineers hardest. I’ve noticed a pattern:

Scenario: Junior engineer uses Claude Code to implement a feature. Code looks clean, passes tests, gets approved. Subtle bug surfaces in production.

Response: “The AI suggested this approach, and it passed review, so I thought it was correct.”

There’s a learned helplessness developing. Instead of building judgment about why code is correct, they’re learning to rely on “it compiled and the AI seemed confident.”

This is a 3-year problem, not a 3-month problem. We’re not training engineers—we’re training AI prompt writers who can’t reason about correctness independently.

What We’re Trying (With Mixed Results)

1. Mandatory “AI Disclosure” in PR descriptions

  • Engineers must note which parts used AI assistance
  • Reviewers know to apply extra scrutiny
  • Result: Helps, but adds process overhead

2. Pair programming for critical paths

  • Force human reasoning before AI generation
  • One person designs, other implements with AI
  • Result: Slower but higher quality

3. “Explain Your Code” Sessions

  • Junior engineers must walk through AI-generated code line-by-line
  • If they can’t explain why it’s correct, back to design
  • Result: Reveals gaps in understanding immediately

4. AI-Free Fridays (Failed Experiment)

  • One day per week, no AI tools allowed
  • Goal: Maintain fundamental skills
  • Result: Teams gamed it by front-loading AI work to Thursday

The Financial Services Constraint

Michelle, you mentioned healthcare systems in your post. In financial services, we face similar regulatory constraints that AI tools don’t understand:

  • PCI compliance - AI generates convenient but non-compliant logging
  • Audit trails - AI skips the tedious compliance hooks we need
  • Data residency - AI doesn’t consider which data can cross borders
  • Regulatory reporting - AI optimizes for performance, we need deterministic correctness

Our security team now reviews 100% of AI-assisted code that touches financial data. That’s… most of our codebase. The “productivity gains” evaporate under compliance overhead.

Your Question: “Can We Afford to Opt Out?”

This keeps me up at night. Here’s the competitive landscape I see:

  • Startups with no legacy constraints - Moving incredibly fast with AI, shipping features weekly
  • Our financial services competitors - Moving cautiously, but we’re ALL moving slowly
  • Tech giants - Can absorb the quality hit because they have massive QA teams

We can’t opt out entirely. But I think the question is: What’s our differentiation?

If we compete on feature velocity against AI-first startups, we lose. They’ll always ship faster.

But if we compete on reliability, security, and compliance, maybe being more deliberate is the advantage?

Our enterprise customers chose us because we don’t break things. If AI acceleration means more production incidents, we’re undermining our core value prop.

The Team Management Challenge

Here’s what worries me most: I’m seeing a confidence crisis in senior engineers.

They know AI code needs more scrutiny. They know review is taking longer. They know quality is slipping. But when they raise concerns, leadership points to “industry adoption rates” and asks “why aren’t we moving faster?”

The best engineers on my team are burning out from the cognitive dissonance. They’re expected to move fast (AI mandate from above) AND maintain quality (their professional standards). These are increasingly incompatible.

Three of my tech leads have privately asked: “Are we falling behind by being too careful? Or is everyone else being reckless?”

I don’t have a good answer.

What I’d Love to See: Industry-Wide Honesty

Michelle, your post is valuable because you’re willing to share real numbers. We need more of this.

Right now, the AI coding tool vendors show only success stories. Analyst reports cite “productivity gains” based on self-reported surveys. Conference talks celebrate velocity.

But the bug escape rates? The security vulnerabilities? The technical debt accumulation? Those conversations happen in private Slack channels and closed-door leadership meetings.

I’d love to see:

  • Anonymous industry benchmarking on AI code quality impact
  • Shared review checklist templates for AI-generated code
  • Case studies on failures, not just successes
  • Honest ROI analysis that includes quality cost

Your 5-point approach is a great start. Mind if I steal “Mandatory Human Design Docs”? That might be the intervention we need.

The Question I Keep Asking My Leadership

If AI makes us write code faster but ship features slower, what exactly are we optimizing for?

Because right now, it feels like we’re optimizing for the appearance of velocity (more commits, more lines of code, more PRs) rather than actual business impact (features shipped, bugs prevented, customers served).

Michelle, appreciate you starting this conversation. Would love to hear from others on how they’re handling the cultural and process shifts beyond just the technical challenges.

Coming from the design systems side, this discussion is validating something I’ve been seeing for months but couldn’t quite articulate.

AI code doesn’t just have more bugs—it has a different kind of problem that’s harder to spot in review: it works, but it doesn’t belong.

The “Almost Right” Problem from a Design Perspective

I lead design systems for 3 product teams. We have design tokens, component libraries, accessibility standards, animation patterns—years of carefully crafted consistency.

Then AI-generated code shows up and it’s like… a different language.

Example from last week:

  • Engineer uses Claude Code to build a modal dialog
  • Code is clean, tests pass, no errors
  • But: Uses hardcoded colors instead of design tokens
  • Uses setTimeout for animations instead of our transition utilities
  • Implements focus trapping incorrectly (fails accessibility audit)
  • Creates a new button variant instead of using existing ones

The AI didn’t know our design system exists. It generated a solution, not our solution.

This is invisible in code review unless the reviewer deeply knows both our patterns AND has time to check. With Luis’s 72-minute reviews, that depth is impossible.

We’re Measuring the Wrong Thing

Michelle, your “generating code faster but shipping features slower” hits hard.

In design, we’d call this “output vs outcome.”

Output: Lines of code written, PRs merged, tickets closed
Outcome: Consistent UX, accessible features, maintainable system

AI is great at output. It’s terrible at outcome.

Our QA designer now spends 60% of her time fixing AI-generated UI inconsistencies that passed eng review:

  • Wrong spacing values (AI uses px, we use rem)
  • Inaccessible color contrasts (AI optimizes for aesthetics, not WCAG)
  • Broken responsive patterns (AI doesn’t test mobile)
  • Duplicate CSS (AI doesn’t know what already exists)

We built a design system to reduce this variance. AI is reintroducing it at scale.

The “Technical Debt” You Can’t See in Metrics

Luis mentioned 4× code duplication. From design systems perspective, I’m seeing duplication of intent:

  • 5 different implementations of “error state” in form fields
  • 3 variations of the same loading spinner
  • Modal dialogs with slightly different padding and shadows
  • Inconsistent focus indicator styles

Each one works individually. But multiply by 100 features and our design system is meaningless.

The cost? Every design change now requires finding and updating all the AI-generated variations. Our “design token update” that should take 1 day now takes 2 weeks of archeology.

The Craft Question Michelle Raised

“What quality standard are we willing to accept in exchange for speed?”

As someone who came up as a designer learning craft and attention to detail, this one stings.

I watch junior engineers generate entire features with AI and never develop taste for good code. They can’t tell the difference between “works” and “works well.”

It’s like teaching someone to paint by having an AI generator make images, then they paint-by-numbers on top. They never learn composition, color theory, or why certain choices matter.

When my startup failed, one lesson was: Good enough becomes permanent. That “quick AI-generated feature” that works but doesn’t fit? Two years later it’s still there, blocking refactoring.

What I’m Trying (Adapted from Luis’s List)

1. AI-Aware Design QA Checklist

  • Specific items for AI-generated UI patterns
  • Design system compliance as merge blocker
  • Works but has 50% review rejection rate (too slow)

2. “Design System Prompting” Training

  • Teaching engineers to include design system context in AI prompts
  • Mixed results—AI still hallucinates patterns

3. Automated Design Token Linter

  • Blocks PRs that use hardcoded values
  • This actually works! But only catches syntax, not semantic misuse

4. Design System “Why” Documentation

  • Writing context AI can’t infer: “Use Button variant=‘secondary’ for destructive actions because it reduces anxiety”
  • Helps humans, AI still ignores it

The Question I Keep Asking

If AI makes it easier to write code that works, but harder to maintain a coherent system, are we just deferring pain to our future selves?

Because in 3 years, when we need to redesign:

  • How do we find all the AI-generated variants?
  • How do we refactor code that works but doesn’t fit patterns?
  • How do we train new engineers when the codebase doesn’t teach consistent practices?

Michelle, your post made me realize this isn’t just a technical or security problem. It’s a systems thinking problem.

AI optimizes locally (“make this feature work”) without understanding globally (“fit into our architecture”). That’s a fundamental mismatch with how design systems—and probably all long-term software systems—are supposed to work.

Would love to hear if anyone’s solved the “AI doesn’t understand our context” problem. Because right now, every productivity gain is a maintenance debt we’re hiding from ourselves.

This conversation needs to happen at every eng leadership table right now. Michelle, Luis, Maya—you’re all describing the same elephant from different sides, and it’s terrifying how consistent the picture is.

I’m scaling our engineering org from 25 to 80+ engineers while this AI transition is happening. We’re trying to build culture, establish standards, and maintain quality while the ground shifts beneath us.

The Organizational Metrics Blind Spot

Here’s what keeps me up: We’re measuring outputs that look good but hide outcomes that are deteriorating.

Our executive dashboard shows:

  • :white_check_mark: Sprint velocity up 18%
  • :white_check_mark: Story points completed increased 22%
  • :white_check_mark: Time-to-first-commit decreased 35%
  • :white_check_mark: Pull requests per engineer up 28%

Leadership is celebrating. “AI is working!”

But when I dig into the metrics we don’t showcase:

  • :cross_mark: Cycle time (commit to production) up 31%
  • :cross_mark: Bug escape rate to production up 27%
  • :cross_mark: Rollback frequency increased 3×
  • :cross_mark: Time spent on bug fixes vs new features: was 30/70, now 52/48
  • :cross_mark: Developer satisfaction scores down 19 points
  • :cross_mark: Code review rejection rate up from 12% to 34%

We’re generating more code, but delivering less value. And the human cost is showing up in retention conversations.

The People Problem No One’s Talking About

Luis, your point about the confidence crisis in senior engineers resonates deeply.

In my 1-on-1s over the last 3 months:

Junior engineers:

  • Love AI tools, feel “productive”
  • Struggle to debug issues in their own AI-generated code
  • Can’t explain architectural decisions
  • Anxious when internet is down (can’t access AI)

Mid-level engineers:

  • Conflicted—use AI but feel guilty
  • Worry they’re not learning fundamentals
  • Spending more time fixing others’ AI code than writing their own

Senior engineers:

  • Frustrated by review burden
  • Question whether they should use AI more to “keep up”
  • Concerned about what they’re teaching juniors
  • Several have privately asked about non-AI-heavy roles elsewhere

The pattern: Junior engineers are confident and wrong. Senior engineers are exhausted and questioning themselves. That’s backwards.

Michelle’s Question on Responsibility

“Who’s responsible for AI-generated bugs?”

This is going to end up in court someday, and we’re not ready for it.

I had this exact conversation with our legal team last quarter. When an AI-generated security vulnerability leads to a data breach:

  • Is it the developer’s fault for accepting AI suggestions?
  • The reviewer’s fault for approving it?
  • The engineering manager’s fault for not catching it?
  • The company’s fault for mandating AI tool usage?
  • The AI vendor’s fault for generating insecure code?

Our legal counsel’s answer: “Probably all of the above, and we have no case law to guide us.”

So we implemented:

  1. Mandatory “AI-Assisted” tagging in commit messages
  2. Audit logs of which AI suggestions were accepted vs rejected
  3. Incident reports tracking AI-assistance levels in buggy code
  4. Insurance review of our E&O policy coverage for AI tools

This isn’t paranoia. This is recognizing that we’re taking on liability we can’t quantify yet.

The Measurement Problem I’m Struggling With

Maya’s “output vs outcome” distinction is brilliant. Here’s how I’m trying to operationalize it:

Traditional Metrics (Output-Focused):

  • Lines of code written
  • PRs merged
  • Story points completed
  • Time to first commit

Outcome-Focused Metrics We’re Adding:

  • Customer-facing value deployed per week
  • Mean time to customer value (design to production)
  • Code that survives 6 months without modification
  • Bug density in production (not just bugs found)
  • Architecture deviation rate (Maya’s design system problem)
  • Engineer confidence in production deploys (survey)

The outcome metrics are harder to measure, which is probably why everyone defaults to output metrics. But output metrics are now actively misleading in the AI era.

What We’re Trying: The “AI Impact Assessment”

Before rolling out any AI coding tool widely, we now run a 4-week pilot with instrumented tracking:

Week 1-2: Baseline

  • Measure cycle time, bug rates, review time, deployment frequency
  • Survey developer sentiment
  • Establish quality baseline

Week 3-4: AI Tool Pilot

  • 10 engineers use the tool, 10 don’t (control group)
  • Measure same metrics
  • Track what breaks

Week 5: Analysis & Decision

  • Compare outcomes not just outputs
  • Factor in downstream costs (review, debugging, refactoring)
  • Decide: adopt, adopt with guardrails, or reject

So far, we’ve pilot-tested 4 AI tools. Adopted 2 with significant guardrails. Rejected 2 entirely.

Leadership pushback: “But everyone else is using these!”
My response: “Everyone else is also seeing 1.7× more issues and pretending it’s fine.”

Luis’s Junior Engineer Problem Scaled Up

“We’re not training engineers—we’re training AI prompt writers who can’t reason about correctness independently.”

This haunts me. We’re hiring fast. 30 new engineers in 6 months. Most are coming up in the AI-first era.

What does an engineering career look like when:

  • You never learned to debug without AI assistance?
  • You never built intuition for performance without AI optimization?
  • You never struggled through algorithm design without AI scaffolding?
  • You never developed taste for clean architecture without AI templates?

In 5 years, do we have a generation of engineers who can’t function when the AI goes down? Who can’t reason from first principles? Who can’t mentor the next generation because they never developed mastery themselves?

This isn’t hypothetical. I’ve had a staff engineer interview candidates who used AI to pass the technical screen, then couldn’t explain their own solution in the on-site. These engineers are being hired. They’re on teams. They’re reviewing code.

The Question I’m Asking Executive Leadership

“If AI coding tools provide short-term velocity at the cost of long-term engineering capability, what’s the ROI at 3 years? At 5 years?”

Because right now, we’re making quarter-by-quarter decisions (“ship faster!”) that have multi-year consequences (degraded codebase, undertrained engineers, accumulated debt).

Michelle, your 5-point approach is a great start. Here’s what I’d add:

  1. Separate “AI-Assisted” vs “AI-Generated” - Code where AI wrote <30% vs >70%—different review standards
  2. Quarterly AI Impact Review - Honest assessment of quality trends with executive leadership
  3. Career Development for AI Era - Explicit training on fundamentals that AI doesn’t teach
  4. AI Opt-Out Paths - Let engineers choose AI-free work for skill development
  5. Customer Impact Metrics - Tie AI adoption to actual customer outcomes, not engineering metrics

The Uncomfortable Truth I’m Facing

The hardest part is this: I don’t think we can opt out entirely, but I also don’t think full adoption is sustainable.

The answer is probably some nuanced middle ground:

  • AI for boilerplate and repetitive tasks
  • Human design for architecture and critical paths
  • Hybrid for everything else
  • Constant measurement and course correction

But “nuanced middle ground” doesn’t scale when you’re hiring fast and everyone has different tool preferences and leadership wants simple answers.

So we’re navigating by feel in the dark, hoping the ground doesn’t drop out before we figure out where the edges are.

Michelle, thank you for starting this with data. Luis, Maya—your perspectives are crucial. We need more of these honest conversations before the industry collectively walks off a cliff celebrating our velocity the whole way down.

This thread is the most important conversation in product engineering right now. Michelle, Luis, Maya, Keisha—each of you is describing a different facet of the same crisis, and from the product side, I’m watching it derail our entire roadmap planning process.

The Product Velocity Paradox

Here’s what keeps me awake: Engineering is moving faster than ever, but product delivery has never felt slower.

Last quarter:

  • Engineers closed 142 tickets (up 31% YoY)
  • Shipped 23 features (down from 29 last quarter, 34 the quarter before)
  • Customer-requested features in backlog: 87 (growing)
  • Average feature stability time (time until no more fixes needed): 6.2 weeks (was 2.1 weeks)

Translation: We’re generating more code, but shipping fewer stable features.

Keisha’s metrics breakdown is spot-on. Our exec dashboard shows green. Our product reality is red.

The “Faster to Write, Slower to Debug” Tax

Maya’s point about “almost right” code is crushing us from a product perspective.

Recent example:

  • Feature: Add batch export functionality to user dashboard
  • Original estimate: 3 days
  • Day 1: Engineer uses Claude Code, completes implementation in 4 hours
  • Day 2-3: QA finds 12 edge cases that don’t work (timezone handling, large datasets, special characters)
  • Day 4-7: Engineer debugging AI-generated export logic
  • Day 8-10: Re-implementation of core export flow
  • Day 11-14: Additional QA cycle and fixes

Result: 14 days instead of 3. But our sprint planning only saw “4 hours to implement” and leadership celebrated the velocity.

The disconnect between “code written” and “feature shipped” is destroying our ability to forecast.

Customer Impact: The Metric That Actually Matters

Luis mentioned that enterprise customers chose his company because they “don’t break things.” This is everything in B2B product.

What we’re seeing:

  • Customer support tickets up 43% (more bugs reaching production)
  • Average ticket resolution time up 2.3× (bugs are harder to diagnose)
  • Customer confidence scores down (NPS dropped 12 points)
  • Enterprise deals delayed due to security concerns (2 deals lost last month, K ARR)

Our biggest enterprise customer asked point-blank last week: “Are you guys using AI code generation? Because we’re seeing more bugs and we have compliance requirements.”

I didn’t know how to answer. “Yes, but we have guardrails” didn’t inspire confidence. Neither did “everyone in the industry is doing it.”

We lost that renewal. K ARR. The stated reason: “Declining product quality and slower response to issues.”

The Roadmap Planning Crisis

Michelle’s question—“What quality standard are we willing to accept?”—is now a product strategy question, not just a technical one.

Old roadmap planning:

  1. Identify customer needs
  2. Size engineering effort
  3. Sequence features by value
  4. Commit to quarterly goals
  5. Track against plan

New roadmap planning:

  1. Identify customer needs
  2. Get initial engineering estimate
  3. Add “AI uncertainty buffer” (2-3× multiplier for debugging and rework)
  4. Add “quality stabilization period” (4-6 weeks post-launch)
  5. Sequence features conservatively
  6. Commit to half as many quarterly goals
  7. Explain to leadership why we’re “moving slower” despite “engineer productivity gains”

This is unsustainable. We’re planning around the expectation that features won’t work correctly the first time.

The Trust Problem from a Product Lens

Keisha described the junior/mid/senior engineer dynamic perfectly. From product, I’m seeing a parallel pattern:

Customer trust trajectory:

  • Early adopters: Expect bugs, tolerate quality issues, value innovation
  • Mainstream customers: Expect reliability, churn on quality problems, value stability
  • Enterprise customers: Require security, audit processes, value trust

Where AI code quality hits hardest: Enterprise customers—our highest-value segment, the segment we’re trying to grow into.

The features we’re shipping faster? Often not the features enterprises need. They need compliance, security, audit trails, edge case handling—precisely the things AI is worst at.

The “Feature vs Bug” Allocation Shift

Keisha’s metric hit me hard: “Time spent on bug fixes vs new features: was 30/70, now 52/48”

From a product perspective, this means:

  • Customer-facing reality: Half our engineering capacity is now firefighting
  • Roadmap reality: We can only commit to half the features we used to
  • Competitive reality: Competitors who haven’t adopted AI as aggressively are shipping more stable features

But here’s the insidious part: The bugs are in AI-generated code from previous sprints. We’re not shipping faster—we’re accumulating maintenance debt that steals from future capacity.

It’s a reverse compound interest. Every AI-accelerated feature creates future drag.

What I’m Asking Engineering Leadership

Michelle, your 5-point approach is helpful. Keisha’s additions are crucial. Here’s what I need as a product partner:

Honest Estimation:

  • Don’t give me “time to write code”—give me “time to stable feature”
  • Include the debugging and rework time
  • Let me plan accurately even if the numbers look worse

Quality Tiers for Features:

  • Which features should NOT use AI? (Security, compliance, core workflows)
  • Which features are safe for AI acceleration? (Internal tools, low-risk additions)
  • Let me make informed tradeoffs

Customer-Facing Transparency:

  • Can we commit to stability timelines?
  • What’s our rollback/fix SLA when AI code fails in production?
  • How do we communicate quality standards to enterprise buyers?

The Question I’m Bringing to Executive Leadership

“If AI coding tools allow us to write features 50% faster, but those features are only 70% correct and require 3× more maintenance, what’s the actual impact on customer value delivery?”

Because from where I sit:

  • Customers don’t care how fast we wrote the code
  • Customers care whether the feature works reliably
  • Customers care whether we fix their problems quickly
  • Customers care whether we’re moving their business forward

And on all of those dimensions, we’re moving backwards despite engineering “productivity” going up.

The Uncomfortable Product Truth

Maya’s question haunts me: “Are we building a future where everything is 95% good?”

In product, 95% good is 100% unusable for:

  • Security features (1 vulnerability = breach)
  • Payment processing (1 error = lost money + trust)
  • Compliance reporting (1 inaccuracy = audit failure)
  • Data export (1 edge case = customer escalation)

But 95% good is fine for:

  • UI tweaks
  • Internal tools
  • Non-critical features
  • Experimental capabilities

The problem is: We’re treating all features the same. AI-assisted everything, uniform review process, same quality bar.

We need tiered quality standards. Some features need 99.99% reliability. Others can ship at 95%. But right now, everything is regressing toward 95%, and our most important features are suffering.

What I’m Proposing

1. Feature Classification System

  • Critical (no AI, double review, extensive testing)
  • Core (AI-assisted, enhanced review, standard testing)
  • Experimental (AI-generated, normal review, exploratory testing)

2. Customer-Facing Quality Metrics

  • Stop reporting “velocity” to customers
  • Start reporting “stability” and “time-to-resolution”
  • Tie engineering AI adoption to customer satisfaction trends

3. Honest Roadmap Buffers

  • Plan for 2× debugging time on AI-assisted features
  • Plan for 4-6 week stabilization periods
  • Commit to fewer features, higher quality

4. AI Tool Selection by Use Case

  • Luis mentioned rejecting 2 of 4 piloted tools—we need that discipline
  • Not all AI tools are appropriate for all features
  • Context matters

The Bottom Line

Michelle asked if we’re in collective quality denial. From product, the answer is: Yes, and it’s about to cost us customers.

The data is clear:

  • Higher bug rates
  • Longer cycle times
  • Lower customer satisfaction
  • Lost enterprise deals

But because “everyone is doing it” and “productivity is up,” we’re pretending the emperor is clothed.

Keisha’s warning about “walking off a cliff celebrating velocity” is exactly right. The cliff is customer churn. The velocity is a vanity metric that hides the fall.

I appreciate everyone’s honesty in this thread. These are the conversations we need to be having with executive leadership, with boards, with investors. Because the current trajectory isn’t sustainable.

The companies that figure out how to use AI tools deliberately, with quality guardrails and honest metrics, will win. The companies that chase velocity at all costs will churn their customers and wonder why the productivity gains didn’t translate to business success.

Which are we going to be?