TDD with AI copilots: Am I the only one struggling with red-green-refactor?

I just finished reading Kent Beck’s recent piece where he argues that TDD prevents AI agents from writing tests that verify broken behavior. It hit me hard because I’ve been fighting with GitHub Copilot for months trying to maintain a test-first workflow.

Here’s my confession: I think AI copilots are killing my TDD discipline, and I’m not sure if I should fight it or embrace it.

The Core Problem: AI Wants Implementation First

Traditional TDD workflow:

  1. Write a failing test (red)
  2. Write minimal code to pass (green)
  3. Refactor for clarity (refactor)

AI copilot workflow in practice:

  1. Start writing a test
  2. Copilot suggests the implementation code
  3. I get tempted to accept it because it looks right
  4. Now I’m writing tests to validate code I already have
  5. Red-green-refactor becomes green-green-refactor

The problem is AI is trained on implementation-first code. It has seen millions of examples where the implementation exists and tests are written after. When I’m trying to do TDD, the AI actively works against me by suggesting solutions before I’ve fully specified the problem.

Does TDD Actually Matter With AI?

Kent Beck’s argument is compelling: when you write tests first, AI can’t cheat by writing tests that verify whatever buggy implementation it produced. The test is the specification, and the AI has to meet that specification.

But I’m struggling to see this work in practice:

Scenario 1: I write the test first

test('should calculate compound interest correctly', () => {
  const result = calculateCompoundInterest(1000, 0.05, 10);
  expect(result).toBe(1628.89);
});

As soon as I hit enter, Copilot suggests an implementation. Great! But did it actually understand compound interest, or did it just pattern-match from its training data? I don’t know what I don’t know.

Scenario 2: I accept the AI implementation first
Now I’m writing tests to verify the AI’s code. The test might pass, but am I testing the right behavior or just confirming the AI’s assumptions?

The Workflow Friction

Here’s what TDD with AI feels like:

  • Constantly fighting autocomplete suggestions
  • Having to deliberately ignore helpful-looking code
  • Feeling like I’m slowing myself down by being “pure” about test-first
  • Questioning whether the discipline adds value when AI can generate both tests and implementation

Meanwhile, my teammates who abandoned TDD are shipping features 2x faster. Their code has AI-generated implementations with AI-generated tests. Coverage looks good. Bugs… well, we haven’t seen a spike yet.

Test-Driven Generation (TDG)?

I came across this concept of “Test-Driven Generation” - using tests as specifications for AI to generate against. It sounds like TDD, but with AI as the implementation engine.

The idea: Write comprehensive tests that specify behavior, then let AI generate the implementation that satisfies those tests. Iterate until all tests pass.

Has anyone actually made this work? It sounds great in theory, but in practice:

  • How do you know your tests specify the right behavior?
  • What if the AI satisfies your tests but introduces bugs you didn’t think to test for?
  • Aren’t you just deferring the “do I trust this” question from implementation to tests?

My Question for the Community

Has TDD survived contact with AI coding tools in your workflow?

Are you:

  1. Still practicing strict TDD (test-first, no exceptions)?
  2. Doing “test-adjacent development” (write tests early but not always first)?
  3. Abandoned TDD entirely and relying on AI + good coverage?
  4. Found some hybrid approach that works?

And more importantly: If you abandoned TDD, have you seen quality suffer? Or was TDD just ceremony that AI made obsolete?

I feel like I’m at a crossroads. Keep fighting to maintain TDD discipline, or accept that AI has fundamentally changed how we approach testing. Would love to hear how others are thinking about this.

Alex, this resonates from a design perspective! I’ve been thinking about this from the lens of specifications before implementation, which is basically what TDD is about, right?

Design Specs as “Tests” for AI

In design systems work, I write detailed specs before building components:

  • What props does it accept?
  • What are the visual states (hover, active, disabled)?
  • What accessibility requirements must it meet?
  • What are the edge cases (long text, missing images, etc.)?

Then I hand those specs to engineers (or increasingly, to AI) to implement.

This feels a lot like TDD, but with design specs instead of code tests. The spec is the contract. The implementation (whether human or AI) has to satisfy the contract.

Where AI Actually Helps

I’ve noticed AI is actually pretty good at implementation when you give it crystal-clear specs. The problem is most people (myself included) write vague specs:

Vague: “Add a button component”
Specific: “Button component with primary/secondary variants, disabled state, loading state, icon support (left/right), full-width option, size variants (small/medium/large), must meet WCAG 2.1 AA contrast requirements”

With the specific spec, AI can generate a component that actually works. With the vague spec, AI guesses and you get something that sorta works but misses half the edge cases.

The Discipline Problem

But here’s what worries me: writing good specs is hard work. TDD forces you to think through the behavior before coding. Design specs force you to think through the requirements before designing.

If AI makes it too easy to skip that thinking step and jump straight to “working code,” we lose the clarity that comes from specification-first thinking.

Your teammates shipping 2x faster without TDD - I bet they’re also accumulating ambiguity debt. The code works, but nobody’s quite sure what it’s supposed to do in edge cases because the specs were never written down.

My Take: Spec-First, Not Test-First

Maybe the answer isn’t “test-first” but “specification-first”. Write down what you want (as tests, as design specs, as acceptance criteria, whatever form makes sense). Then let AI help implement it.

But don’t skip the specification step. That’s where the thinking happens.

Alex, you’re asking exactly the right question. At Anthropic, we actually ran an experiment on this: A/B test TDD vs implementation-first, both with AI assistance.

The Experiment Setup

We had 20 engineers build the same feature (a data pipeline with transformation logic) over 2 weeks:

  • Group A: Strict TDD - write tests first, use AI only for implementation
  • Group B: Implementation-first - use AI for both code and tests, no TDD discipline

All engineers had access to the same AI tools (Claude for code generation, Copilot for autocomplete).

The Results

Initial Velocity (first week):

  • Group B was 40% faster
  • TDD group spent more time thinking about test cases upfront
  • Implementation-first group shipped “working” code quickly

Bug Density (found in code review):

  • Group A (TDD): 2.3 bugs per 100 lines
  • Group B (implementation-first): 5.1 bugs per 100 lines
  • Most of Group B bugs were logic errors in edge cases

Refactoring Difficulty (second week):

  • Both groups had to refactor for new requirements
  • Group A refactored 30% faster
  • Group B’s tests were tightly coupled to implementation, broke frequently

The Key Insight

TDD with AI produced 30% fewer logic bugs but took 40% longer initially. However, over the full 2-week cycle (including refactoring), the TDD group was only 15% slower overall.

The quality/velocity tradeoff exists, but it’s smaller than people think when you account for the full development cycle.

Why TDD Works Better With AI

Kent Beck is right about one thing: TDD prevents AI from writing tests that validate broken behavior.

When you write the test first:

  1. The test embodies your understanding of the requirements
  2. AI must satisfy YOUR spec, not its own assumptions
  3. If the AI’s implementation is wrong, the test fails - immediate feedback

When you let AI write both code and tests:

  1. AI might misunderstand the requirements
  2. AI writes tests that validate its misunderstanding
  3. Tests pass, but behavior is wrong - delayed feedback (or worse, production bugs)

But Is It Worth It?

Here’s my honest take: For greenfield features, TDD with AI is worth it. For maintenance work, maybe not.

Greenfield: You’re defining behavior from scratch. TDD forces clarity. Worth the upfront slowdown.

Maintenance: Behavior is already defined by existing code. AI can infer the pattern. Less value from test-first.

To Your Teammates Shipping 2x Faster

I’d be curious: are they measuring cycle time from “code complete” to “production stable”? Or just “code complete” to “merged”?

My hypothesis: They’re fast to ship, slower to stabilize. The bugs show up later, in production or during refactoring.

But that’s a hypothesis. You could actually measure it: track bugs found per feature in the first 30 days post-launch. Compare TDD vs non-TDD features. Data wins arguments.

Alex, this conversation is giving me flashbacks to the TDD debates of the 2000s, but with a new twist. What concerns me most is the generational split I’m seeing on my team.

The Divide

Senior engineers (10+ years): Still practicing TDD, treating AI as a tool within that discipline. They write tests first, let AI suggest implementations, validate the AI’s work.

Mid-level engineers (3-7 years): Pragmatic approach. TDD for critical paths, implementation-first for straightforward features. Use AI to speed up both.

Junior engineers (<3 years): Mostly abandoned TDD. They’ve never experienced the pain of maintaining untested code at scale. AI makes it “too easy” to skip the discipline.

The Cultural Erosion

Here’s what worries me: TDD was always about discipline, not speed. The value was forcing yourself to think before coding, to specify behavior explicitly, to design for testability.

AI removes the friction that made TDD “necessary.” You can ship working code without the discipline. But are we losing something important?

I’ve noticed junior engineers on my team struggle with:

  • Understanding complex code they didn’t write (even if they prompted the AI)
  • Debugging failures because they don’t understand the implementation
  • Refactoring confidently because tests are coupled to implementation details

The seniors who maintained TDD discipline don’t have these problems. They understand the code deeply because they thought through the tests first.

The Question of Quality Culture

When I started at my Fortune 500 fintech 3 years ago, TDD was mandatory. Strict red-green-refactor. No exceptions.

Last quarter, we relaxed that policy. Too much friction with AI tools. Engineers were frustrated. Velocity was suffering.

Now we’re “TDD recommended” - which in practice means optional. About 30% of code is still TDD’d, mostly by senior engineers.

And our bug rate hasn’t spiked yet. I keep waiting for the other shoe to drop, but it hasn’t.

Maybe TDD was always a crutch for teams that didn’t have good specifications and code review? Maybe AI plus good review is sufficient?

But I worry we’re in a honeymoon period. The technical debt accrues slowly. We won’t see the cost for 6-12 months.

My Concern: Losing Craft

There’s something intangible about TDD that goes beyond bug rates. It’s about understanding your code deeply. It’s about thinking before acting. It’s about the craft of software engineering.

When you let AI write both implementation and tests, you’re outsourcing the thinking. You’re pattern-matching instead of reasoning.

And I wonder: can you build truly great software by pattern-matching? Or do you need the deep understanding that comes from wrestling with the problem yourself?

To Alex’s Question

Should you fight to maintain TDD? I think the answer depends on what you’re building:

  • Critical systems (payments, security, healthcare): Yes, fight for TDD. The cost of failure is too high.
  • Internal tools, MVPs, experiments: Maybe not. Speed might matter more than perfect quality.
  • Product features at scale: Gray area. Depends on your quality culture and review processes.

But personally? I’m still practicing TDD on anything complex. Call me old-fashioned, but I sleep better knowing I thought through the behavior before coding.

As the product person in this thread, I’m going to ask the uncomfortable question: Does TDD actually matter from a business perspective?

From where I sit, here’s what I care about:

  1. Features ship on time
  2. Features work correctly for users
  3. Features don’t break in production
  4. Team can iterate quickly on user feedback

Whether those features were built with TDD or implementation-first is invisible to me and to users. What matters is outcomes, not process.

The Business Reality

When I pitch to investors or talk to customers, nobody asks “what’s your test coverage?” or “do you practice TDD?” They ask:

  • Does your product solve my problem?
  • How fast can you ship new features?
  • Is it reliable?

If AI helps engineers ship faster with acceptable quality, why would I care about TDD?

The Velocity Question

Rachel’s data showed TDD was 40% slower initially but only 15% slower overall. From a business perspective, that 15% slowdown is meaningful.

Over a year, that’s:

  • 15% fewer features shipped
  • 15% slower response to market changes
  • 15% less experimentation and learning

In a competitive market, that could be the difference between winning and losing.

The Hidden Costs

But Luis raises a good point about technical debt. The question is: when does the debt come due, and what’s the interest rate?

If abandoning TDD means:

  • 5% more production bugs → costs customer trust, support tickets, engineering time to fix
  • 30% slower refactoring → slows down future iterations
  • Poor code understanding → harder to onboard new engineers, slower debugging

Then maybe that 15% initial speed gain is a bad trade.

What I Actually Need From Engineering

Here’s what I wish engineering teams would measure:

  • Time to value: How long from “we should build this” to “users can use this”?
  • Iteration speed: How quickly can we modify features based on user feedback?
  • Stability: How often do features break in production?
  • Flexibility: How easily can we pivot direction when needed?

If TDD improves these metrics, great. If implementation-first AI coding improves them, also great. I genuinely don’t care about the process - I care about the outcomes.

The Real Question

To Alex and the engineering team: Can you show me that TDD actually improves business metrics?

Because from where I sit, the arguments for TDD are often about engineering ideals (“clean code,” “craftsmanship,” “discipline”) rather than business impact.

And in a startup environment where we’re fighting for survival, I need to prioritize business impact.

Maybe I’m wrong. Maybe TDD is a leading indicator of business health. But I need to see the data.