Agentic AI is Creating a New Kind of Debt: Nobody Knows WHY Our Systems Work Anymore

I inherited a codebase last quarter that’s haunted. Not by bugs - the code works fine. It’s haunted by the ghost of decisions nobody understands anymore.

Half the components were AI-generated. Tests pass. Features work. But when I asked “why is the state management architected this way?” - nobody knew. Not even the person who merged the PRs.

This isn’t technical debt in the traditional sense. The code isn’t broken. It’s something different: Nobody knows WHY our system works the way it does.

Margaret Storey calls this “cognitive debt” - when design decisions become unknowable. And I think agentic AI is about to make this problem exponentially worse.

The Problem: Code That Works But Nobody Understands

Here’s what happened on my team:

We have a component library. Beautiful, functional, well-tested. About 60% of it was AI-generated over the past year by various engineers using various AI tools.

Last month, product wanted to change how our form validation worked. Simple request, right?

Problem 1: Architecture Archaeology
Nobody could explain why validation was split across three different layers. Was it intentional? Was it AI-generated boilerplate that kind of worked so nobody questioned it?

Problem 2: Fear of Refactoring
When code is human-written, you can usually intuit what’s safe to change. When it’s AI-generated, every file feels like a Jenga tower - touch one piece and maybe nothing happens, or maybe everything breaks.

Problem 3: No Documentation of Trade-offs
Human architects document (sometimes): “We chose approach A over B because of X constraint and Y performance requirement.”

AI-generated code? It’s just… there. Working. Why this pattern instead of alternatives? Unknown.

It Gets Worse With Agents

Right now, we’re talking about AI helping engineers write code. Engineers still make architectural decisions, even if AI implements them.

But with agentic AI, the agent makes both architectural AND implementation decisions. And it might make dozens of micro-decisions that accumulate into a design that nobody explicitly chose.

Example:
Agent tasked with “add user authentication” might:

  • Choose JWT over sessions (why?)
  • Implement token refresh in a specific way (why this approach?)
  • Structure middleware in a particular pattern (why this order?)
  • Make dozens of other small choices

Each decision might be “correct” in isolation. But together, they form an architecture that nobody designed holistically.

The Data Is Concerning

Research from Qodo AI shows developers spend 42% of their time on technical debt. That’s the debt we can see and measure.

How much time are we going to spend on cognitive debt? The debt we can’t even identify until we need to change something and realize nobody understands how it works?

Real Impact I’m Seeing

1. Onboarding Nightmare
New team members can’t learn from reading code because the code doesn’t reflect human decision-making patterns. It reflects AI optimization patterns.

2. Architecture Drift
When nobody understands the current architecture, new features get bolted on without coherence. The system becomes a collection of AI-generated solutions rather than a cohesive design.

3. Innovation Paralysis
“We want to pivot the product” turns into “but we’d have to rewrite everything because nobody knows how to evolve this architecture.”

4. Debugging Hell
When edge cases emerge, debugging requires understanding design intent. If there was no human intent, just AI implementation, where do you even start?

The Analogy That Scares Me

Imagine having a really smart coworker who:

  • Solves every problem you ask them to solve
  • Never documents their thought process
  • Never explains their reasoning
  • Doesn’t respond to questions about why they chose their approach
  • Eventually leaves the company

That’s what AI-generated codebases feel like. Lots of solutions, zero context.

Question Nobody’s Asking

We measure technical debt. We track it, remediate it, have strategies for it.

How do we even measure cognitive debt?

Is it:

  • Percentage of code where nobody knows the design rationale?
  • Time spent understanding existing code vs writing new code?
  • Number of “why does this work this way?” questions that can’t be answered?

What I’m Trying (With Mixed Results)

Attempt 1: “Explain Before Merge”
Require engineers to document why AI chose a particular approach. But how do you document something you don’t understand?

Attempt 2: Architectural Decision Records (ADRs)
Great in theory. In practice, if AI generated the architecture, the ADR becomes “AI suggested this and it worked.”

Attempt 3: Regular “Code Archaeology” Sessions
Team sits down and reverse-engineers agent decisions. We’re spending time trying to understand our own codebase.

This feels backwards, but seems necessary.

Attempt 4: Pair Programming with AI
Engineer AND AI generate solution together, with engineer actively questioning AI choices. Works better, but much slower.

The Question I Can’t Answer

Is this just growing pains? Will we develop new practices for managing AI-generated architectures?

Or are we building systems that will become unmaintainable as soon as they need non-trivial changes?

Because right now, I’m watching teams:

  • Ship faster than ever (velocity ↑)
  • Understand their systems less than ever (comprehension ↓)

That seems like a recipe for disaster in 2-3 years when these systems need significant evolution.

How do you maintain understanding when agents are building your systems?

Has anyone figured this out? Or are we all just hoping this problem solves itself?

This resonates deeply. I recently debugged an API endpoint that “just worked” for 6 months until it didn’t. The code was AI-generated, well-tested, but when a subtle edge case hit production, I spent hours trying to understand the logic.

The problem wasn’t the bug itself - it was that I couldn’t figure out WHAT the code was trying to do in the first place. Variable names made sense individually, but the overall design pattern was unfamiliar. It wasn’t a pattern I’d seen in our codebase or in common frameworks.

Turns out the AI had created a novel (to us) approach to caching and validation. It worked great for 99% of cases. That 1% edge case? Took me 4x longer to fix than it would have if a human had written it, because I had to reverse-engineer the intent first.

Proposal: Agents Must Document Architecture

What if we required AI-generated code to include:

  • Design Pattern Used: “I implemented the Command pattern here because…”
  • Alternative Approaches Considered: “I chose X over Y because…”
  • Assumptions Made: “This assumes Z will always be true…”
  • Known Limitations: “This won’t handle edge case W well…”

Basically, force the AI to be the coworker who over-documents. That overhead might be worth it for the understanding we gain.

The challenge: Does this slow down the velocity gains we get from AI? Probably. Is understanding worth that trade-off? I think yes, but leadership might disagree.

Maya, you’ve identified what I consider to be the most underestimated risk of agentic AI: institutional knowledge loss at scale.

This isn’t just a technical problem - it’s an organizational risk. When key people leave your company, what walks out the door?

Traditionally: Their experience, their context, their understanding of why decisions were made.

In the AI-generated world: Nobody had that context to begin with, so nothing walks out. But also, nothing stays.

The Documentation Problem

Alex’s proposal for agents to document their reasoning is good, but I think it’s incomplete. Here’s why:

Agent documentation explains what the agent did, not what the business needed.

Example:

  • Business need: “Users complained checkout is confusing”
  • Agent solution: Refactored checkout flow
  • Agent documentation: “Implemented state machine pattern for checkout with 7 states…”

The agent documents its technical solution beautifully. What it doesn’t capture: Why was checkout confusing? What user feedback drove this? What alternatives did we consider? What business constraints limited our options?

That business context is even more important than technical context.

We’re Treating This As Code Review When It’s Architecture Review

The cognitive debt problem occurs because we’re using AI to make architectural decisions without architectural review.

Current process:

  1. Engineer describes feature to AI
  2. AI generates implementation
  3. Code review checks if it works
  4. Merge

Missing step: Architecture review that asks “Is this the right design for our long-term needs?”

Agents optimize for “works now.” Architects optimize for “maintainable for 3 years.” These are different objectives.

Measuring Cognitive Debt

You asked how to measure this. Here’s what we’ve started tracking:

  • Time to understand - How long does it take a new engineer to understand a module?
  • “Why” questions in code reviews - Frequency of “Why did we do it this way?” with no good answer
  • Refactoring resistance - Features we avoid changing because nobody understands them
  • Documentation completeness - Can you understand the codebase from docs alone?

These are soft metrics, but they’re leading indicators of cognitive debt.

The Solution Space

I’m experimenting with a “code archaeology” requirement: Before changing any AI-generated code, engineer must:

  1. Document current architecture (reverse engineer it)
  2. Identify design patterns used
  3. Explain why this approach (their best guess)
  4. Get review from architect on the analysis
  5. Only then, propose changes

It’s slower. But it forces understanding before modification. And it builds organizational knowledge as we go.

The real answer might be: Agents can implement, but humans must architect. AI as powerful execution layer, but human-designed systems.

This hits from a product sustainability angle that I don’t think we’re taking seriously enough.

Products Need to Evolve, Not Just Work

Your checkout flow example is perfect. Products don’t stay static - they evolve based on:

  • User feedback and changing needs
  • Market competition and new features
  • Business model changes
  • Technical platform evolution

If your codebase is a collection of AI-generated solutions that “work” but nobody understands, you can’t evolve the product. You can only add to it.

We’re already seeing this: Teams reluctant to refactor AI-generated code, so they build workarounds. Over time, the product becomes a Frankenstein of patches rather than a coherent experience.

The Innovation Paralysis Risk

Cognitive debt blocks pivots. If nobody understands how your authentication system works, you can’t easily switch from email/password to SSO. If nobody knows why your data model is structured this way, you can’t adapt it for new use cases.

Short-term velocity gains → long-term innovation paralysis.

Proposal: Explainability Score for PRs

What if we tracked “explainability” as a metric like code coverage?

PR checklist:

  • Tests pass (automated)
  • Code coverage >80% (automated)
  • Architecture is documented (human review)
  • Design rationale is clear (human review)
  • Future engineer could modify this (human judgment)

If you can’t check that last box, the PR doesn’t merge - even if it works perfectly.

Would this slow things down? Yes. Would it prevent cognitive debt accumulation? I think so.

Maya, this connects directly to the junior developer conversation we had earlier. Cognitive debt and junior development are the same problem from different angles.

If juniors learn by reading code, but the code doesn’t reflect human reasoning, what are they learning?

They’re learning AI patterns, not engineering judgment.

The Succession Planning Problem

Michelle mentioned institutional knowledge loss. Here’s the concrete scenario:

2028: Your senior architect retires. She’s been with the company 8 years.

Problem: 60% of your codebase was AI-generated in the last 3 years. She understands the old 40%, but the new stuff? She knows it works, but not why it’s designed that way.

Who takes over as architect? The mid-level engineers who’ve been shipping AI-generated features quickly but never had to make deep architectural decisions?

This is a succession planning failure waiting to happen across the industry.

Code Archaeology as Training

I love your “code archaeology” sessions idea. We’re doing something similar, but framing it as training:

“Reverse Engineering Workshop” - Juniors take AI-generated code and:

  1. Identify the design patterns used
  2. Diagram the architecture
  3. List alternative approaches
  4. Explain trade-offs

This serves dual purpose:

  • Builds understanding of existing codebase
  • Teaches architectural thinking

The juniors hate it at first (“why are we doing archaeology on our own code?”) but then they start seeing patterns. They start developing the ability to critique AI decisions, which is actually the skill they need.

Proposal: Agents as Teaching Moments

What if every AI-generated PR included:

  • Teaching section: “This code uses the Strategy pattern. Here’s why I chose it. Here’s when you’d use alternatives.”
  • Study questions: “What would happen if X changed? How would you modify this for Y use case?”

Turn AI-generated code into learning materials, not just working solutions.

Slower? Yes. But it prevents cognitive debt while building team capability.