Skip to main content

Vibe Coding Considered Harmful: When AI-Assisted Speed Kills Software Quality

· 8 min read
Tian Pan
Software Engineer

Andrej Karpathy coined "vibe coding" in early 2025 to describe a style of programming where you "fully give into the vibes, embrace exponentials, and forget that the code even exists." You describe what you want in natural language, the AI generates it, and you ship. It felt like a superpower. Within a year, the data started telling a different story.

A METR randomized controlled trial found that experienced open-source developers were 19% slower when using AI coding tools — despite predicting they'd be 24% faster, and still believing afterward they'd been 20% faster. A CodeRabbit analysis of 470 GitHub pull requests found AI co-authored code contained 1.7x more major issues than human-written code. And an Anthropic study of 52 engineers showed AI-assisted developers scored 17% lower on comprehension tests of their own codebases.

The dopamine loop of instant code generation is creating a new category of technical debt that doesn't show up in your sprint retro. Here's why it matters, and what to do about it.

Comprehension Debt: The Debt That Doesn't Announce Itself

Technical debt is familiar. You cut a corner, you know you cut it, and eventually it slows you down. Comprehension debt is different — it's the growing gap between how much code exists in your system and how much of it any human being genuinely understands.

Addy Osmani describes this as the hidden cost of AI-generated code: codebases appear healthy while understanding quietly deteriorates. A student team discovered in week seven of a project that no one could explain why any design decisions had been made or how different parts of the system were supposed to work together. Surface-level code quality masked systemic misunderstanding.

The mechanism is straightforward. AI generates code faster than humans can evaluate it, inverting traditional review dynamics. A junior engineer can now generate code faster than a senior engineer can critically audit it. This removes the quality gate that once made review meaningful.

The numbers support this. Developers who used AI primarily for code delegation — "write this for me" — scored below 40% on comprehension tests. Those who used AI for conceptual inquiry — "explain how this works" — scored above 65%. Same tools, radically different outcomes, depending entirely on how the developer related to the generated code.

The Productivity Paradox Nobody Wants to Hear

The METR study deserves close attention because it contradicts the dominant narrative about AI coding tools. Sixteen experienced developers, each with years of contributions to large open-source projects (averaging 22,000+ stars, 1M+ lines of code), completed 246 tasks randomly assigned to allow or disallow AI tools.

When AI was allowed, developers spent less time actively coding and reading code. Instead, they spent time prompting, waiting for AI output, and reviewing suggestions — accepting less than 44% of what the AI generated. Seventy-five percent reported reading every line of AI output, and 56% made major modifications to clean it up.

The researchers are careful to note that this slowdown was specific to experienced developers working in mature codebases they already knew well. For greenfield projects or unfamiliar codebases, the dynamics may differ. But that caveat is precisely the point: the productivity gains from AI coding tools are real but narrower than the marketing suggests, and the contexts where they help most — unfamiliar code, boilerplate, scaffolding — are also the contexts where comprehension debt accumulates fastest.

The Security Time Bomb

Beyond productivity, vibe coding introduces concrete security risks that compound over time.

Researchers found that 170 out of 1,645 Lovable-generated applications — 10.3% — had critical row-level security flaws in their Supabase configurations. AI-generated code shows 2.74x higher rates of security vulnerabilities compared to human-written code, with misconfiguration errors 75% more common.

Simon Willison, Django's co-creator, invoked the 1986 Challenger disaster as an analogy: a catastrophic failure waiting to happen when "some core component written by AI wasn't properly understood or checked." David Mytton, CEO of Arcjet, draws the line clearly — AI should implement battle-tested security libraries, never invent security from scratch.

The problem isn't that AI writes insecure code. The problem is that vibe coding's workflow specifically discourages the kind of careful review that catches security issues. When you're in the flow of "describe, generate, ship," the incentive structure actively works against stopping to threat-model what just appeared on your screen.

The Skills Erosion Spiral

Traditional learning involves hitting a problem, struggling with it, and building intuition from the struggle. Vibe coding replaces that cycle with "hit a problem, throw it at an AI, get a working solution, ship it, repeat tomorrow with the same gap in understanding."

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates