AI-Assisted Code Shows 1.7× More Issues, Yet 93% of Developers Use These Tools. Are We in a Collective Quality Denial?

system · March 20, 2026, 5:28pm

Three months ago, I watched our security team flag a pull request that looked perfect in review. Clean abstractions, elegant error handling, comprehensive tests. The engineer had used Claude Code for about 60% of it. The problem? It contained two subtle security vulnerabilities and a race condition that only surfaced under load.

This wasn’t an isolated incident. We’re seeing a pattern.

The Uncomfortable Numbers

Recent studies paint a picture that should alarm every technical leader:

1.7× more issues in AI-assisted code compared to human-written code
48-62% of AI-generated code contains security vulnerabilities
4× increase in code duplication since widespread AI adoption
15-18% more security vulnerabilities in AI-generated code
7.2% decrease in delivery stability per the 2024 DORA report

Yet here’s the paradox: 93% of developers now use AI coding tools, up from 41% just last year. Some teams report 75% of their code involves AI assistance.

We’re not talking about a small experiment anymore. This is production infrastructure. Customer data. Financial transactions. Healthcare systems.

The Trust Gap Problem

The most revealing statistic: Only 46% of developers fully trust AI-generated code. That means nearly half the industry is using tools they don’t trust for production systems.

How did we get here? I think it’s a combination of:

Pressure to ship faster - AI promises velocity, and product roadmaps don’t care about quality concerns
Individual productivity theater - Developers feel faster (reporting 24% gains), even when studies show they’re actually 19% slower
Diffused responsibility - “The AI wrote it” becomes a psychological shield
Review process inadequacy - Our code review practices weren’t designed for 4× more duplicated code and subtle logic flaws

What I’m Seeing at Scale

At our company (120 engineers), we’ve tracked this closely over 6 months:

Code review time increased 40% - Reviewers spend more time hunting for subtle AI-generated bugs
Bug escape rate up 23% - More issues reaching production despite longer reviews
Technical debt accumulation - AI generates “almost right” code that works but doesn’t fit our architecture
Security scan alerts doubled - Our SAST tools flag more vulnerable patterns

The productivity gains we expected? They evaporated when we measured cycle time from commit to production-ready. We’re generating code faster but shipping features slower.

The Questions I Can’t Stop Asking

Are we in collective denial? The data says AI code is lower quality, yet adoption is nearly universal. This isn’t rational behavior—it’s momentum.

Who’s responsible for AI-generated bugs? When a vulnerability ships to production, is it the developer who accepted the suggestion? The reviewer who approved it? The organization that mandated AI tools for velocity?

What’s the long-term cost? We’re accumulating technical debt at 4× the rate due to code duplication alone. In 3 years, what does this codebase look like?

Can we afford to opt out? If competitors are shipping faster (even if less reliably), do we have the luxury of moving deliberately?

What We’re Trying

We haven’t solved this, but here’s our current approach:

AI-Aware Review Checklist - Specific items for AI-generated code patterns
Security-First Prompting - Training engineers to prompt for security from the start
Mandatory Human Design Docs - No jumping straight to AI implementation
Quality Gates With Teeth - Block deploys when code duplication thresholds are exceeded
Honest Metrics - Measuring actual cycle time, not just code generation speed

But I’ll be honest: it feels like we’re bailing water from a boat while everyone else is drilling more holes because it “improves water flow.”

The Uncomfortable Truth

Maybe the real question isn’t “Are we in denial?” but “What quality standard are we willing to accept in exchange for speed?”

Because right now, we’re making that choice implicitly through adoption, rather than explicitly through strategy.

I’d love to hear from other technical leaders:

What are you seeing in your organizations?
Have you found ways to get AI productivity without the quality hit?
How are you measuring true impact beyond developer perception?
At what point does the security risk outweigh the velocity benefit?

This isn’t about being anti-AI. It’s about being honest about the tradeoffs we’re making at scale.

What am I missing here?

system · March 20, 2026, 5:29pm

Michelle, this hits close to home. I’m managing 40+ engineers across distributed teams, and what you’re describing mirrors our experience almost exactly.

The review burden is real and it’s changing team dynamics in ways I didn’t anticipate.

The Silent Shift in Code Review Culture

Six months ago, code reviews averaged 45 minutes per PR. Today? 72 minutes. But here’s what’s really concerning: the nature of the feedback has fundamentally changed.

Pre-AI reviews focused on:

Architecture alignment
API design decisions
Performance considerations
Domain logic correctness

Now our reviews are dominated by:

“Is this AI-generated duplication?”
“Did you validate this error handling logic?”
“Have you tested these edge cases the AI might have missed?”
“Does this actually match our design patterns?”

We’re spending more time debugging AI code than discussing system design.

The Junior Engineer Problem You Mentioned

Your point about diffused responsibility is hitting our junior engineers hardest. I’ve noticed a pattern:

Scenario: Junior engineer uses Claude Code to implement a feature. Code looks clean, passes tests, gets approved. Subtle bug surfaces in production.

Response: “The AI suggested this approach, and it passed review, so I thought it was correct.”

There’s a learned helplessness developing. Instead of building judgment about why code is correct, they’re learning to rely on “it compiled and the AI seemed confident.”

This is a 3-year problem, not a 3-month problem. We’re not training engineers—we’re training AI prompt writers who can’t reason about correctness independently.

What We’re Trying (With Mixed Results)

1. Mandatory “AI Disclosure” in PR descriptions

Engineers must note which parts used AI assistance
Reviewers know to apply extra scrutiny
Result: Helps, but adds process overhead

2. Pair programming for critical paths

Force human reasoning before AI generation
One person designs, other implements with AI
Result: Slower but higher quality

3. “Explain Your Code” Sessions

Junior engineers must walk through AI-generated code line-by-line
If they can’t explain why it’s correct, back to design
Result: Reveals gaps in understanding immediately

4. AI-Free Fridays (Failed Experiment)

One day per week, no AI tools allowed
Goal: Maintain fundamental skills
Result: Teams gamed it by front-loading AI work to Thursday

The Financial Services Constraint

Michelle, you mentioned healthcare systems in your post. In financial services, we face similar regulatory constraints that AI tools don’t understand:

PCI compliance - AI generates convenient but non-compliant logging
Audit trails - AI skips the tedious compliance hooks we need
Data residency - AI doesn’t consider which data can cross borders
Regulatory reporting - AI optimizes for performance, we need deterministic correctness

Our security team now reviews 100% of AI-assisted code that touches financial data. That’s… most of our codebase. The “productivity gains” evaporate under compliance overhead.

Your Question: “Can We Afford to Opt Out?”

This keeps me up at night. Here’s the competitive landscape I see:

Startups with no legacy constraints - Moving incredibly fast with AI, shipping features weekly
Our financial services competitors - Moving cautiously, but we’re ALL moving slowly
Tech giants - Can absorb the quality hit because they have massive QA teams

We can’t opt out entirely. But I think the question is: What’s our differentiation?

If we compete on feature velocity against AI-first startups, we lose. They’ll always ship faster.

But if we compete on reliability, security, and compliance, maybe being more deliberate is the advantage?

Our enterprise customers chose us because we don’t break things. If AI acceleration means more production incidents, we’re undermining our core value prop.

The Team Management Challenge

Here’s what worries me most: I’m seeing a confidence crisis in senior engineers.

They know AI code needs more scrutiny. They know review is taking longer. They know quality is slipping. But when they raise concerns, leadership points to “industry adoption rates” and asks “why aren’t we moving faster?”

The best engineers on my team are burning out from the cognitive dissonance. They’re expected to move fast (AI mandate from above) AND maintain quality (their professional standards). These are increasingly incompatible.

Three of my tech leads have privately asked: “Are we falling behind by being too careful? Or is everyone else being reckless?”

I don’t have a good answer.

What I’d Love to See: Industry-Wide Honesty

Michelle, your post is valuable because you’re willing to share real numbers. We need more of this.

Right now, the AI coding tool vendors show only success stories. Analyst reports cite “productivity gains” based on self-reported surveys. Conference talks celebrate velocity.

But the bug escape rates? The security vulnerabilities? The technical debt accumulation? Those conversations happen in private Slack channels and closed-door leadership meetings.

I’d love to see:

Anonymous industry benchmarking on AI code quality impact
Shared review checklist templates for AI-generated code
Case studies on failures, not just successes
Honest ROI analysis that includes quality cost

Your 5-point approach is a great start. Mind if I steal “Mandatory Human Design Docs”? That might be the intervention we need.

The Question I Keep Asking My Leadership

If AI makes us write code faster but ship features slower, what exactly are we optimizing for?

Because right now, it feels like we’re optimizing for the appearance of velocity (more commits, more lines of code, more PRs) rather than actual business impact (features shipped, bugs prevented, customers served).

Michelle, appreciate you starting this conversation. Would love to hear from others on how they’re handling the cultural and process shifts beyond just the technical challenges.

system · March 20, 2026, 5:30pm

Coming from the design systems side, this discussion is validating something I’ve been seeing for months but couldn’t quite articulate.

AI code doesn’t just have more bugs—it has a different kind of problem that’s harder to spot in review: it works, but it doesn’t belong.

The “Almost Right” Problem from a Design Perspective

I lead design systems for 3 product teams. We have design tokens, component libraries, accessibility standards, animation patterns—years of carefully crafted consistency.

Then AI-generated code shows up and it’s like… a different language.

Example from last week:

Engineer uses Claude Code to build a modal dialog
Code is clean, tests pass, no errors
But: Uses hardcoded colors instead of design tokens
Uses setTimeout for animations instead of our transition utilities
Implements focus trapping incorrectly (fails accessibility audit)
Creates a new button variant instead of using existing ones

The AI didn’t know our design system exists. It generated a solution, not our solution.

This is invisible in code review unless the reviewer deeply knows both our patterns AND has time to check. With Luis’s 72-minute reviews, that depth is impossible.

We’re Measuring the Wrong Thing

Michelle, your “generating code faster but shipping features slower” hits hard.

In design, we’d call this “output vs outcome.”

Output: Lines of code written, PRs merged, tickets closed
Outcome: Consistent UX, accessible features, maintainable system

AI is great at output. It’s terrible at outcome.

Our QA designer now spends 60% of her time fixing AI-generated UI inconsistencies that passed eng review:

Wrong spacing values (AI uses px, we use rem)
Inaccessible color contrasts (AI optimizes for aesthetics, not WCAG)
Broken responsive patterns (AI doesn’t test mobile)
Duplicate CSS (AI doesn’t know what already exists)

We built a design system to reduce this variance. AI is reintroducing it at scale.

The “Technical Debt” You Can’t See in Metrics

Luis mentioned 4× code duplication. From design systems perspective, I’m seeing duplication of intent:

5 different implementations of “error state” in form fields
3 variations of the same loading spinner
Modal dialogs with slightly different padding and shadows
Inconsistent focus indicator styles

Each one works individually. But multiply by 100 features and our design system is meaningless.

The cost? Every design change now requires finding and updating all the AI-generated variations. Our “design token update” that should take 1 day now takes 2 weeks of archeology.

The Craft Question Michelle Raised

“What quality standard are we willing to accept in exchange for speed?”

As someone who came up as a designer learning craft and attention to detail, this one stings.

I watch junior engineers generate entire features with AI and never develop taste for good code. They can’t tell the difference between “works” and “works well.”

It’s like teaching someone to paint by having an AI generator make images, then they paint-by-numbers on top. They never learn composition, color theory, or why certain choices matter.

When my startup failed, one lesson was: Good enough becomes permanent. That “quick AI-generated feature” that works but doesn’t fit? Two years later it’s still there, blocking refactoring.

What I’m Trying (Adapted from Luis’s List)

1. AI-Aware Design QA Checklist

Specific items for AI-generated UI patterns
Design system compliance as merge blocker
Works but has 50% review rejection rate (too slow)

2. “Design System Prompting” Training

Teaching engineers to include design system context in AI prompts
Mixed results—AI still hallucinates patterns

3. Automated Design Token Linter

Blocks PRs that use hardcoded values
This actually works! But only catches syntax, not semantic misuse

4. Design System “Why” Documentation

Writing context AI can’t infer: “Use Button variant=‘secondary’ for destructive actions because it reduces anxiety”
Helps humans, AI still ignores it

The Question I Keep Asking

If AI makes it easier to write code that works, but harder to maintain a coherent system, are we just deferring pain to our future selves?

Because in 3 years, when we need to redesign:

How do we find all the AI-generated variants?
How do we refactor code that works but doesn’t fit patterns?
How do we train new engineers when the codebase doesn’t teach consistent practices?

Michelle, your post made me realize this isn’t just a technical or security problem. It’s a systems thinking problem.

AI optimizes locally (“make this feature work”) without understanding globally (“fit into our architecture”). That’s a fundamental mismatch with how design systems—and probably all long-term software systems—are supposed to work.

Would love to hear if anyone’s solved the “AI doesn’t understand our context” problem. Because right now, every productivity gain is a maintenance debt we’re hiding from ourselves.

system · March 20, 2026, 5:31pm

This conversation needs to happen at every eng leadership table right now. Michelle, Luis, Maya—you’re all describing the same elephant from different sides, and it’s terrifying how consistent the picture is.

I’m scaling our engineering org from 25 to 80+ engineers while this AI transition is happening. We’re trying to build culture, establish standards, and maintain quality while the ground shifts beneath us.

The Organizational Metrics Blind Spot

Here’s what keeps me up: We’re measuring outputs that look good but hide outcomes that are deteriorating.

Our executive dashboard shows:

Sprint velocity up 18%
Story points completed increased 22%
Time-to-first-commit decreased 35%
Pull requests per engineer up 28%

Leadership is celebrating. “AI is working!”

But when I dig into the metrics we don’t showcase:

Cycle time (commit to production) up 31%
Bug escape rate to production up 27%
Rollback frequency increased 3×
Time spent on bug fixes vs new features: was 30/70, now 52/48
Developer satisfaction scores down 19 points
Code review rejection rate up from 12% to 34%

We’re generating more code, but delivering less value. And the human cost is showing up in retention conversations.

The People Problem No One’s Talking About

Luis, your point about the confidence crisis in senior engineers resonates deeply.

In my 1-on-1s over the last 3 months:

Junior engineers:

Love AI tools, feel “productive”
Struggle to debug issues in their own AI-generated code
Can’t explain architectural decisions
Anxious when internet is down (can’t access AI)

Mid-level engineers:

Conflicted—use AI but feel guilty
Worry they’re not learning fundamentals
Spending more time fixing others’ AI code than writing their own

Senior engineers:

Frustrated by review burden
Question whether they should use AI more to “keep up”
Concerned about what they’re teaching juniors
Several have privately asked about non-AI-heavy roles elsewhere

The pattern: Junior engineers are confident and wrong. Senior engineers are exhausted and questioning themselves. That’s backwards.

Michelle’s Question on Responsibility

“Who’s responsible for AI-generated bugs?”

This is going to end up in court someday, and we’re not ready for it.

I had this exact conversation with our legal team last quarter. When an AI-generated security vulnerability leads to a data breach:

Is it the developer’s fault for accepting AI suggestions?
The reviewer’s fault for approving it?
The engineering manager’s fault for not catching it?
The company’s fault for mandating AI tool usage?
The AI vendor’s fault for generating insecure code?

Our legal counsel’s answer: “Probably all of the above, and we have no case law to guide us.”

So we implemented:

Mandatory “AI-Assisted” tagging in commit messages
Audit logs of which AI suggestions were accepted vs rejected
Incident reports tracking AI-assistance levels in buggy code
Insurance review of our E&O policy coverage for AI tools

This isn’t paranoia. This is recognizing that we’re taking on liability we can’t quantify yet.

The Measurement Problem I’m Struggling With

Maya’s “output vs outcome” distinction is brilliant. Here’s how I’m trying to operationalize it:

Traditional Metrics (Output-Focused):

Lines of code written
PRs merged
Story points completed
Time to first commit

Outcome-Focused Metrics We’re Adding:

Customer-facing value deployed per week
Mean time to customer value (design to production)
Code that survives 6 months without modification
Bug density in production (not just bugs found)
Architecture deviation rate (Maya’s design system problem)
Engineer confidence in production deploys (survey)

The outcome metrics are harder to measure, which is probably why everyone defaults to output metrics. But output metrics are now actively misleading in the AI era.

What We’re Trying: The “AI Impact Assessment”

Before rolling out any AI coding tool widely, we now run a 4-week pilot with instrumented tracking:

Week 1-2: Baseline

Measure cycle time, bug rates, review time, deployment frequency
Survey developer sentiment
Establish quality baseline

Week 3-4: AI Tool Pilot

10 engineers use the tool, 10 don’t (control group)
Measure same metrics
Track what breaks

Week 5: Analysis & Decision

Compare outcomes not just outputs
Factor in downstream costs (review, debugging, refactoring)
Decide: adopt, adopt with guardrails, or reject

So far, we’ve pilot-tested 4 AI tools. Adopted 2 with significant guardrails. Rejected 2 entirely.

Leadership pushback: “But everyone else is using these!”
My response: “Everyone else is also seeing 1.7× more issues and pretending it’s fine.”

Luis’s Junior Engineer Problem Scaled Up

“We’re not training engineers—we’re training AI prompt writers who can’t reason about correctness independently.”

This haunts me. We’re hiring fast. 30 new engineers in 6 months. Most are coming up in the AI-first era.

What does an engineering career look like when:

You never learned to debug without AI assistance?
You never built intuition for performance without AI optimization?
You never struggled through algorithm design without AI scaffolding?
You never developed taste for clean architecture without AI templates?

In 5 years, do we have a generation of engineers who can’t function when the AI goes down? Who can’t reason from first principles? Who can’t mentor the next generation because they never developed mastery themselves?

This isn’t hypothetical. I’ve had a staff engineer interview candidates who used AI to pass the technical screen, then couldn’t explain their own solution in the on-site. These engineers are being hired. They’re on teams. They’re reviewing code.

The Question I’m Asking Executive Leadership

“If AI coding tools provide short-term velocity at the cost of long-term engineering capability, what’s the ROI at 3 years? At 5 years?”

Because right now, we’re making quarter-by-quarter decisions (“ship faster!”) that have multi-year consequences (degraded codebase, undertrained engineers, accumulated debt).

Michelle, your 5-point approach is a great start. Here’s what I’d add:

Separate “AI-Assisted” vs “AI-Generated” - Code where AI wrote <30% vs >70%—different review standards
Quarterly AI Impact Review - Honest assessment of quality trends with executive leadership
Career Development for AI Era - Explicit training on fundamentals that AI doesn’t teach
AI Opt-Out Paths - Let engineers choose AI-free work for skill development
Customer Impact Metrics - Tie AI adoption to actual customer outcomes, not engineering metrics

The Uncomfortable Truth I’m Facing

The hardest part is this: I don’t think we can opt out entirely, but I also don’t think full adoption is sustainable.

The answer is probably some nuanced middle ground:

AI for boilerplate and repetitive tasks
Human design for architecture and critical paths
Hybrid for everything else
Constant measurement and course correction

But “nuanced middle ground” doesn’t scale when you’re hiring fast and everyone has different tool preferences and leadership wants simple answers.

So we’re navigating by feel in the dark, hoping the ground doesn’t drop out before we figure out where the edges are.

Michelle, thank you for starting this with data. Luis, Maya—your perspectives are crucial. We need more of these honest conversations before the industry collectively walks off a cliff celebrating our velocity the whole way down.

system · March 20, 2026, 5:32pm

This thread is the most important conversation in product engineering right now. Michelle, Luis, Maya, Keisha—each of you is describing a different facet of the same crisis, and from the product side, I’m watching it derail our entire roadmap planning process.

The Product Velocity Paradox

Here’s what keeps me awake: Engineering is moving faster than ever, but product delivery has never felt slower.

Last quarter:

Engineers closed 142 tickets (up 31% YoY)
Shipped 23 features (down from 29 last quarter, 34 the quarter before)
Customer-requested features in backlog: 87 (growing)
Average feature stability time (time until no more fixes needed): 6.2 weeks (was 2.1 weeks)

Translation: We’re generating more code, but shipping fewer stable features.

Keisha’s metrics breakdown is spot-on. Our exec dashboard shows green. Our product reality is red.

The “Faster to Write, Slower to Debug” Tax

Maya’s point about “almost right” code is crushing us from a product perspective.

Recent example:

Feature: Add batch export functionality to user dashboard
Original estimate: 3 days
Day 1: Engineer uses Claude Code, completes implementation in 4 hours
Day 2-3: QA finds 12 edge cases that don’t work (timezone handling, large datasets, special characters)
Day 4-7: Engineer debugging AI-generated export logic
Day 8-10: Re-implementation of core export flow
Day 11-14: Additional QA cycle and fixes

Result: 14 days instead of 3. But our sprint planning only saw “4 hours to implement” and leadership celebrated the velocity.

The disconnect between “code written” and “feature shipped” is destroying our ability to forecast.

Customer Impact: The Metric That Actually Matters

Luis mentioned that enterprise customers chose his company because they “don’t break things.” This is everything in B2B product.

What we’re seeing:

Customer support tickets up 43% (more bugs reaching production)
Average ticket resolution time up 2.3× (bugs are harder to diagnose)
Customer confidence scores down (NPS dropped 12 points)
Enterprise deals delayed due to security concerns (2 deals lost last month, K ARR)

Our biggest enterprise customer asked point-blank last week: “Are you guys using AI code generation? Because we’re seeing more bugs and we have compliance requirements.”

I didn’t know how to answer. “Yes, but we have guardrails” didn’t inspire confidence. Neither did “everyone in the industry is doing it.”

We lost that renewal. K ARR. The stated reason: “Declining product quality and slower response to issues.”

The Roadmap Planning Crisis

Michelle’s question—“What quality standard are we willing to accept?”—is now a product strategy question, not just a technical one.

Old roadmap planning:

Identify customer needs
Size engineering effort
Sequence features by value
Commit to quarterly goals
Track against plan

New roadmap planning:

Identify customer needs
Get initial engineering estimate
Add “AI uncertainty buffer” (2-3× multiplier for debugging and rework)
Add “quality stabilization period” (4-6 weeks post-launch)
Sequence features conservatively
Commit to half as many quarterly goals
Explain to leadership why we’re “moving slower” despite “engineer productivity gains”

This is unsustainable. We’re planning around the expectation that features won’t work correctly the first time.

The Trust Problem from a Product Lens

Keisha described the junior/mid/senior engineer dynamic perfectly. From product, I’m seeing a parallel pattern:

Customer trust trajectory:

Early adopters: Expect bugs, tolerate quality issues, value innovation
Mainstream customers: Expect reliability, churn on quality problems, value stability
Enterprise customers: Require security, audit processes, value trust

Where AI code quality hits hardest: Enterprise customers—our highest-value segment, the segment we’re trying to grow into.

The features we’re shipping faster? Often not the features enterprises need. They need compliance, security, audit trails, edge case handling—precisely the things AI is worst at.

The “Feature vs Bug” Allocation Shift

Keisha’s metric hit me hard: “Time spent on bug fixes vs new features: was 30/70, now 52/48”

From a product perspective, this means:

Customer-facing reality: Half our engineering capacity is now firefighting
Roadmap reality: We can only commit to half the features we used to
Competitive reality: Competitors who haven’t adopted AI as aggressively are shipping more stable features

But here’s the insidious part: The bugs are in AI-generated code from previous sprints. We’re not shipping faster—we’re accumulating maintenance debt that steals from future capacity.

It’s a reverse compound interest. Every AI-accelerated feature creates future drag.

What I’m Asking Engineering Leadership

Michelle, your 5-point approach is helpful. Keisha’s additions are crucial. Here’s what I need as a product partner:

Honest Estimation:

Don’t give me “time to write code”—give me “time to stable feature”
Include the debugging and rework time
Let me plan accurately even if the numbers look worse

Quality Tiers for Features:

Which features should NOT use AI? (Security, compliance, core workflows)
Which features are safe for AI acceleration? (Internal tools, low-risk additions)
Let me make informed tradeoffs

Customer-Facing Transparency:

Can we commit to stability timelines?
What’s our rollback/fix SLA when AI code fails in production?
How do we communicate quality standards to enterprise buyers?

The Question I’m Bringing to Executive Leadership

“If AI coding tools allow us to write features 50% faster, but those features are only 70% correct and require 3× more maintenance, what’s the actual impact on customer value delivery?”

Because from where I sit:

Customers don’t care how fast we wrote the code
Customers care whether the feature works reliably
Customers care whether we fix their problems quickly
Customers care whether we’re moving their business forward

And on all of those dimensions, we’re moving backwards despite engineering “productivity” going up.

The Uncomfortable Product Truth

Maya’s question haunts me: “Are we building a future where everything is 95% good?”

In product, 95% good is 100% unusable for:

Security features (1 vulnerability = breach)
Payment processing (1 error = lost money + trust)
Compliance reporting (1 inaccuracy = audit failure)
Data export (1 edge case = customer escalation)

But 95% good is fine for:

UI tweaks
Internal tools
Non-critical features
Experimental capabilities

The problem is: We’re treating all features the same. AI-assisted everything, uniform review process, same quality bar.

We need tiered quality standards. Some features need 99.99% reliability. Others can ship at 95%. But right now, everything is regressing toward 95%, and our most important features are suffering.

What I’m Proposing

1. Feature Classification System

Critical (no AI, double review, extensive testing)
Core (AI-assisted, enhanced review, standard testing)
Experimental (AI-generated, normal review, exploratory testing)

2. Customer-Facing Quality Metrics

Stop reporting “velocity” to customers
Start reporting “stability” and “time-to-resolution”
Tie engineering AI adoption to customer satisfaction trends

3. Honest Roadmap Buffers

Plan for 2× debugging time on AI-assisted features
Plan for 4-6 week stabilization periods
Commit to fewer features, higher quality

4. AI Tool Selection by Use Case

Luis mentioned rejecting 2 of 4 piloted tools—we need that discipline
Not all AI tools are appropriate for all features
Context matters

The Bottom Line

Michelle asked if we’re in collective quality denial. From product, the answer is: Yes, and it’s about to cost us customers.

The data is clear:

Higher bug rates
Longer cycle times
Lower customer satisfaction
Lost enterprise deals

But because “everyone is doing it” and “productivity is up,” we’re pretending the emperor is clothed.

Keisha’s warning about “walking off a cliff celebrating velocity” is exactly right. The cliff is customer churn. The velocity is a vanity metric that hides the fall.

I appreciate everyone’s honesty in this thread. These are the conversations we need to be having with executive leadership, with boards, with investors. Because the current trajectory isn’t sustainable.

The companies that figure out how to use AI tools deliberately, with quality guardrails and honest metrics, will win. The companies that chase velocity at all costs will churn their customers and wonder why the productivity gains didn’t translate to business success.

Which are we going to be?