I inherited a codebase last quarter that’s haunted. Not by bugs - the code works fine. It’s haunted by the ghost of decisions nobody understands anymore.
Half the components were AI-generated. Tests pass. Features work. But when I asked “why is the state management architected this way?” - nobody knew. Not even the person who merged the PRs.
This isn’t technical debt in the traditional sense. The code isn’t broken. It’s something different: Nobody knows WHY our system works the way it does.
Margaret Storey calls this “cognitive debt” - when design decisions become unknowable. And I think agentic AI is about to make this problem exponentially worse.
The Problem: Code That Works But Nobody Understands
Here’s what happened on my team:
We have a component library. Beautiful, functional, well-tested. About 60% of it was AI-generated over the past year by various engineers using various AI tools.
Last month, product wanted to change how our form validation worked. Simple request, right?
Problem 1: Architecture Archaeology
Nobody could explain why validation was split across three different layers. Was it intentional? Was it AI-generated boilerplate that kind of worked so nobody questioned it?
Problem 2: Fear of Refactoring
When code is human-written, you can usually intuit what’s safe to change. When it’s AI-generated, every file feels like a Jenga tower - touch one piece and maybe nothing happens, or maybe everything breaks.
Problem 3: No Documentation of Trade-offs
Human architects document (sometimes): “We chose approach A over B because of X constraint and Y performance requirement.”
AI-generated code? It’s just… there. Working. Why this pattern instead of alternatives? Unknown.
It Gets Worse With Agents
Right now, we’re talking about AI helping engineers write code. Engineers still make architectural decisions, even if AI implements them.
But with agentic AI, the agent makes both architectural AND implementation decisions. And it might make dozens of micro-decisions that accumulate into a design that nobody explicitly chose.
Example:
Agent tasked with “add user authentication” might:
- Choose JWT over sessions (why?)
- Implement token refresh in a specific way (why this approach?)
- Structure middleware in a particular pattern (why this order?)
- Make dozens of other small choices
Each decision might be “correct” in isolation. But together, they form an architecture that nobody designed holistically.
The Data Is Concerning
Research from Qodo AI shows developers spend 42% of their time on technical debt. That’s the debt we can see and measure.
How much time are we going to spend on cognitive debt? The debt we can’t even identify until we need to change something and realize nobody understands how it works?
Real Impact I’m Seeing
1. Onboarding Nightmare
New team members can’t learn from reading code because the code doesn’t reflect human decision-making patterns. It reflects AI optimization patterns.
2. Architecture Drift
When nobody understands the current architecture, new features get bolted on without coherence. The system becomes a collection of AI-generated solutions rather than a cohesive design.
3. Innovation Paralysis
“We want to pivot the product” turns into “but we’d have to rewrite everything because nobody knows how to evolve this architecture.”
4. Debugging Hell
When edge cases emerge, debugging requires understanding design intent. If there was no human intent, just AI implementation, where do you even start?
The Analogy That Scares Me
Imagine having a really smart coworker who:
- Solves every problem you ask them to solve
- Never documents their thought process
- Never explains their reasoning
- Doesn’t respond to questions about why they chose their approach
- Eventually leaves the company
That’s what AI-generated codebases feel like. Lots of solutions, zero context.
Question Nobody’s Asking
We measure technical debt. We track it, remediate it, have strategies for it.
How do we even measure cognitive debt?
Is it:
- Percentage of code where nobody knows the design rationale?
- Time spent understanding existing code vs writing new code?
- Number of “why does this work this way?” questions that can’t be answered?
What I’m Trying (With Mixed Results)
Attempt 1: “Explain Before Merge”
Require engineers to document why AI chose a particular approach. But how do you document something you don’t understand?
Attempt 2: Architectural Decision Records (ADRs)
Great in theory. In practice, if AI generated the architecture, the ADR becomes “AI suggested this and it worked.”
Attempt 3: Regular “Code Archaeology” Sessions
Team sits down and reverse-engineers agent decisions. We’re spending time trying to understand our own codebase.
This feels backwards, but seems necessary.
Attempt 4: Pair Programming with AI
Engineer AND AI generate solution together, with engineer actively questioning AI choices. Works better, but much slower.
The Question I Can’t Answer
Is this just growing pains? Will we develop new practices for managing AI-generated architectures?
Or are we building systems that will become unmaintainable as soon as they need non-trivial changes?
Because right now, I’m watching teams:
- Ship faster than ever (velocity ↑)
- Understand their systems less than ever (comprehension ↓)
That seems like a recipe for disaster in 2-3 years when these systems need significant evolution.
How do you maintain understanding when agents are building your systems?
Has anyone figured this out? Or are we all just hoping this problem solves itself?