Skip to main content

When Everyone Has an AI Coding Agent: The Team Dynamics Nobody Warned You About

· 10 min read
Tian Pan
Software Engineer

A team of twelve engineers adopts AI coding tools enthusiastically. Six months later, each engineer is merging nearly twice as many pull requests. The engineering manager celebrates. Then the on-call rotation starts paging. Debugging sessions last twice as long. Nobody can explain why a particular module was structured the way it was. The engineer who wrote it replies honestly: "I don't know — the AI generated most of it and it seemed fine."

This scenario is playing out at companies everywhere. The individual productivity story is real: developers finish tasks faster, write more tests, and clear backlogs more efficiently. The team-level story is more complicated, and most organizations aren't ready for it.

The Productivity Paradox at Code Review

The first place team-scale AI adoption breaks down is the most visible: code review.

When developers generate code faster, they also open pull requests faster. Research on large Copilot deployments found that teams with high AI adoption complete 21% more tasks and merge 98% more PRs — but PR review time increases 91%. The bottleneck isn't the coder anymore. It's everyone else in the pipeline.

This makes intuitive sense. Reviewers don't get AI assistance that scales proportionally to the volume they're asked to review. They're still reading line-by-line, checking for correctness, understanding intent, and evaluating architectural fit. The ratio of code produced to code reviewed has been inverted, and most teams haven't changed their review culture to compensate.

The result is one of two failure modes. Either reviewers rubber-stamp PRs to keep up with volume — which is how low-quality AI-generated code enters the codebase unchallenged — or they slow everything down trying to review properly, which causes developer frustration and the perception that review is a bottleneck to route around.

What works instead: stop treating AI-generated PRs like handwritten PRs. Automated tools should handle the routine checks — syntax, security patterns, duplicate code detection — so human reviewers focus exclusively on architectural fit, business logic correctness, and intent alignment. The review checklist changes. "Does this do what it says?" becomes less important. "Does this belong in the codebase in this form?" becomes more important.

How Knowledge Silos Form Faster

Before widespread AI tool adoption, knowledge silos accumulated gradually. A developer would work on a module for months, become the unofficial expert, and others would learn from them over time. This was slow, but it created distributed understanding.

AI tools accelerate code production without accelerating the transfer of understanding. A junior developer can now generate a working authentication module in an hour, have it reviewed by a senior who confirms it looks correct, and merge it — without either of them deeply understanding why it was structured that way. The code works. The knowledge silo formed instantly.

Research measuring code comprehension bears this out. Junior developers using AI assistance show significantly lower understanding of the code they ship compared to code they wrote unaided. The comprehension gap is measurable: juniors who wrote code themselves scored 17 points higher on understanding tests than those who generated it with AI. The code they shipped was equivalent in function. What they internalized was not.

This matters most when things break. Understanding code that you genuinely wrote is qualitatively different from understanding code that was generated and looked correct at review time. When a production incident hits at 2 AM, you need engineers who actually understand the system, not engineers who can describe what the system is supposed to do based on the code's surface appearance.

The protocol shift here is deliberate knowledge attribution. When a PR contains significant AI-generated sections, the PR description should explain the intent and tradeoffs chosen — not just what the code does. This forces the author to develop enough understanding to articulate the rationale. It also gives reviewers a signal when understanding is thin: if the author can't explain why the module is structured as it is, that's a flag.

Code Review Culture Has Already Broken Down

There's a subtler problem than volume: the norms around what code review is for have quietly shifted.

Historically, code review served multiple functions simultaneously. It caught bugs. It ensured style consistency. It spread knowledge across the team. And it was a mechanism for senior engineers to mentor junior engineers — the review comments were part of how juniors learned to write better code.

AI tools disrupt all four functions at once. Style enforcement gets delegated to formatters and linters. Bug detection gets offloaded to AI review tools. Knowledge spreading breaks down because the code is generated, not reasoned through. And mentorship — the function hardest to replace — evaporates because the junior engineer didn't struggle through the problem. They prompted their way past it.

The most experienced engineers on your team are increasingly spending code review time doing archaeology. They're looking at AI-generated code that is locally coherent but globally confused — code that uses correct syntax and passes tests but makes architectural choices nobody would have made deliberately. GitClear's 2024 analysis found an 8x increase in duplicated code blocks in AI-heavy codebases, and that traditional refactoring activity dropped from 25% to under 10% of developer activity. That's not because the codebase needed less refactoring. It's because nobody was building the understanding necessary to identify what needed to be refactored.

The "Nobody Understands This" Problem Compounds

A single AI-generated module that nobody deeply understands is a manageable liability. A codebase where 40-60% of the code was generated over 18 months without deep comprehension is a different kind of problem.

Research from enterprise deployments found that teams often can't even answer basic accountability questions: which services contain AI-generated code, who approved specific sections, and what the intended architectural constraints were. When that context is absent, the compounding effects become severe.

Technical debt accumulates differently with AI-generated code than with human-written code. Human-written technical debt tends to be local and legible: shortcuts that were taken deliberately, areas that need cleanup that the original author could identify. AI-generated technical debt is often structural: design choices that are individually defensible but collectively incoherent, abstraction layers that exist because the model defaulted to enterprise patterns, dependencies that were included because they appeared in training data.

Maintenance costs follow a specific pattern in organizations that don't address this early. First-year costs run 12% higher than expected once you account for review overhead and higher code churn. By year two, that gap widens significantly as the accumulated structural debt requires larger rewrites. The engineers doing those rewrites have to reconstruct the architectural intent from scratch, which takes longer than if the knowledge had been captured in the first place.

Protocols That Actually Work

None of this argues against AI coding tools. The teams getting compounding benefits from AI adoption share a set of practices that separate tool adoption from tool-driven chaos.

Encode team knowledge as executable artifacts. The most effective teams translate tacit architectural principles into documented, versioned standards that AI tools and reviewers can check against. This isn't a style guide — it's specific enough to answer questions like "should this module use the event bus or direct calls?" and "what's the acceptable abstraction depth for a new service?" When these standards exist as reviewable documents, AI-generated code that violates them is detectable.

Change the mentorship model explicitly. In the pre-AI world, a senior developer mentored a junior by reviewing code and explaining the reasoning behind feedback. That model assumes the junior wrote the code through struggle and learning. AI-assisted juniors need a different interaction: the senior needs to probe their understanding of what was generated, not just whether the code looks correct. The question to ask is not "does this work?" but "why is this the right approach here, and what would you do differently if the constraint changed?"

Establish a generation threshold. Data from mature teams suggests that an optimal zone exists around 25-40% of code being AI-generated, where productivity gains outpace review and quality overhead. Above 40%, rework rates and review times increase faster than velocity gains. This isn't a universal constant, but having an explicit conversation about what percentage makes sense for your team's review capacity is more useful than letting adoption be purely organic.

Own the review bottleneck. If code review time is increasing, the answer is not to reduce review rigor — it's to route low-value review work to automated tools so humans concentrate on high-value review. Teams that keep the same review norms from the pre-AI era will either burn out their reviewers or let quality degrade. The pipeline needs to evolve at the same pace as the generation velocity.

Require intent documentation at the PR level. PRs with significant AI-generated sections should include a section answering: what problem does this solve, what approach was chosen, and why are alternatives ruled out? This is not bureaucratic overhead — it's the minimum context needed for knowledge transfer to happen. It also surfaces when a developer doesn't understand their own PR, which is important information to have before merge.

The Junior Developer Question

The long-term consequences for junior developer development deserve explicit attention.

There are two plausible futures. In one, AI tools accelerate junior learning by giving them instant feedback loops, exposing them to high-quality patterns, and letting them tackle more complex problems sooner. In the other, AI tools atrophy foundational skills by removing the productive struggle that builds genuine understanding, and juniors ship more code while learning less from it.

The evidence right now points toward both happening simultaneously — the outcome depends entirely on how senior engineers structure the mentorship relationship with AI-using juniors. Teams where seniors explicitly probe understanding and require juniors to explain their AI-generated code in their own words are developing stronger juniors faster. Teams where review is about output quality rather than developer comprehension are accumulating a hidden liability: a group of developers who can generate working code but can't diagnose it when it breaks.

This matters for the engineering organization's long-term health. The experienced engineers of 2030 are the junior developers of today. If those developers are learning to generate code rather than understand it, the knowledge base of the organization erodes over a longer timescale than any quarterly productivity metric captures.

The Structural Constraint Is the Organization, Not the Tool

The clearest finding across every team-scale study of AI coding adoption is that organizational structure is the binding constraint, not tool quality. Teams with strong review culture, explicit knowledge-transfer practices, and deployment pipelines that scale with code velocity see compounding returns from AI tools. Teams without those structures see productivity gains that plateau or reverse as debt accumulates.

This is an engineering leadership problem, not an engineering problem. The individual developers are doing exactly what the tools enable them to do. The leadership question is whether the organizational infrastructure around code review, knowledge transfer, and quality standards is keeping pace with the velocity increase.

The teams that will look back on this period as a competitive advantage are not the ones that adopted AI tools first. They're the ones that changed their team structures, mentorship models, and review practices to match the new production reality. That's a slower, harder change — and it's the one that determines whether the productivity gains compound or evaporate.

References:Let's stay in touch and Follow me for more thoughts and updates