AI Coding Tools: When Does Assistance Become Dependency?

cto_michelle · March 18, 2026, 12:33am

I’ve been watching my engineering teams use AI coding tools for the past year, and I’ve noticed something fascinating: the same tool produces completely different outcomes depending on how developers engage with it.

Some developers use AI and become more skilled over time. Others use AI and stagnate—or worse, develop bad habits that are hard to unlearn.

The difference isn’t intelligence or experience. It’s how they use AI.

The 65% vs. 40% Gap

The Anthropic study on AI coding skills found something critical that most headlines missed:

Developers who used AI for conceptual inquiry scored 65%+ on comprehension tests
Developers who used AI to delegate code generation scored below 40%

That’s a 25-point gap—not between “AI users” and “non-AI users,” but between two different modes of AI usage.

Let me show you what this looks like in practice.

The “Mentor Mode” Engineer

I have a junior engineer—let’s call them Sam—who joined 8 months ago. Sam uses Copilot constantly. But here’s how:

When implementing a feature:

Prompts AI: “Write a function to validate email addresses”
Reviews generated code
Immediately asks follow-up: “Why did you use this regex pattern instead of a library? What edge cases does this handle? What international email formats might break this?”
Tests edge cases based on AI’s explanations
Refines implementation based on understanding

When I review Sam’s PRs, the code is solid and Sam can explain every decision. They’re using AI as a knowledgeable pair programmer—not just accepting suggestions, but interrogating them.

Sam’s skill trajectory: Started as junior, promoted to mid-level in 8 months (faster than our typical 12-18 months). Can now debug complex issues independently.

The “Delegation Mode” Engineer

I have another engineer—let’s call them Alex—also hired 8 months ago, same experience level. Alex also uses Copilot constantly. But differently:

When implementing a feature:

Prompts AI: “Write a function to validate email addresses”
Reviews generated code superficially
If tests pass, immediately moves on
Never asks “why” or “what if”
Treats AI output as authoritative

When I review Alex’s PRs, the code looks fine, but Alex can’t explain the implementation details. When bugs surface, Alex struggles to diagnose them because they never built a mental model of how the code works.

Alex’s skill trajectory: Still functioning at junior level after 8 months. Requires senior oversight for any non-trivial debugging.

The Critical Question: “Why?” vs. “Does It Work?”

The difference between Sam and Alex isn’t about using AI—both use it extensively. It’s about the questions they ask.

Sam asks:

Why does this work?
What are the edge cases?
How would I debug this if it fails?
What happens under load?
Are there better approaches?

Alex asks:

Does it compile?
Do the tests pass?
Can I ship this?

Sam is using AI to build understanding. Alex is using AI to avoid understanding.

This Is a Leadership Problem, Not a Developer Problem

Here’s the uncomfortable realization I’ve had: we, as engineering leaders, are incentivizing the wrong AI usage patterns.

If our metrics are:

Features shipped per sprint
Lines of code written
Tickets closed

…then Alex’s approach is more efficient. Why spend time asking “why” when the code already works?

But if our metrics include:

Debugging speed
Code quality over time
Ability to explain implementation decisions
Production incident root cause analysis

…then Sam’s approach is clearly superior.

What are we measuring? And what behavior are we incentivizing?

How to Encourage “Mentor Mode” AI Usage

Based on what’s worked (and failed) with my teams, here are some approaches:

1. Mandatory “Why” Documentation

Every PR with AI-generated code must include:

The prompt used
Why this approach was chosen
What edge cases were considered
How you’d debug it if it fails

If the engineer can’t explain these, the code doesn’t ship—even if it works.

2. “Explain It to a Junior” Code Reviews

During code review, ask the author to explain their AI-generated code as if teaching it to someone else. Can’t explain it? Go back and learn it first.

3. Celebrate Conceptual Questions

In team meetings, highlight when someone asked a great conceptual question to AI and learned something. Make curiosity a visible value.

4. AI Usage Guidelines in Onboarding

Don’t just say “use AI.” Teach how to use AI:

Use AI to explore solutions and understand tradeoffs
Ask AI “why” and “what if” questions
Use AI to learn patterns and mental models
Accept AI suggestions without understanding them
Ship code you can’t explain
Use AI to avoid learning fundamentals

The Uncomfortable Question

Here’s what I keep wrestling with: Can you actually teach the “mentor mode” mindset, or is it something people either have or don’t?

Sam’s natural curiosity drives them to ask “why.” Alex’s practical focus drives them to ask “does it work?”

Are those fundamental personality differences? Or can we, through culture and incentives, teach engineers to be more curious about AI-generated code?

I genuinely don’t know. But I know we can’t just let engineers “use AI however they want” and hope they develop good habits. The data shows that doesn’t work.

For engineering leaders: How are you shaping AI usage patterns on your teams? Have you found ways to encourage conceptual inquiry over blind delegation? And for individual contributors: How do you use AI, and has it made you more or less skilled over time?

I’m curious whether this “mentor mode” vs. “delegation mode” distinction resonates with your experiences.

eng_director_luis · March 18, 2026, 12:34am

Michelle, your “Sam vs. Alex” comparison is spot-on, and it maps directly to what we’re seeing in our bug rotation program.

The engineers who ask “why” during bug fixes progress dramatically faster than those who just patch and move on.

We Can Teach Curiosity (With Structure)

You asked: “Can you teach the mentor mode mindset, or is it innate?”

My answer after 6 months of experimentation: You can teach it, but it requires explicit scaffolding.

Natural curiosity (Sam) is great. But most engineers (like Alex) won’t spontaneously develop it. They need prompts to ask the right questions.

Here’s what we implemented:

The “Three Questions” Rule for AI-Generated Code

Before any AI-generated code can be submitted for review, the engineer must document answers to three questions:

“Why does this work?” - Explain the logic/algorithm, not just “what” it does
“What could break this?” - Identify 3 edge cases or failure modes
“How would I debug this?” - If this fails in production, what’s your investigation plan?

Key point: We don’t allow “I don’t know” as an answer. If you don’t know, you go back to AI (or docs, or a senior) and learn before the code can move forward.

Real Example: Email Validation

Last month, we had exactly the scenario you described—a junior used AI to generate email validation code.

Without the Three Questions rule, they would have shipped:

// AI-generated regex
const isValidEmail = (email) => /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email);

Tests pass. Looks fine. Ships to production. Breaks for international domains.

With the Three Questions rule:

Q1: Why does this work?
“It checks for: anything + @ + anything + . + anything”

Q2: What could break this?

“Um… international characters? Let me check…”
[Researches, discovers this fails for valid emails like 用户@例え.jp]
“Oh. This only handles ASCII. We need to support Unicode.”

Q3: How would I debug this?

"If a user reports ‘invalid email,’ I’d check:
- The email format they’re using
- Whether it contains non-ASCII characters
- Whether our regex supports that format
- Check email validation RFC specs"

Result: Engineer learns about internationalization, updates implementation, and now has a mental model for similar problems.

The Follow-Up Question Habit

Michelle, your description of Sam asking follow-up questions is critical. But Alex won’t spontaneously do that.

So we made it structural: “One Follow-Up Per AI Prompt” rule.

For any AI-generated code, the engineer must:

Ask AI to generate code (fine)
Ask at least one follow-up conceptual question before accepting it

Examples of good follow-ups:

“What edge cases does this handle?”
“Why did you choose this algorithm over [alternative]?”
“What are the performance implications at scale?”
“How does this fail if [condition]?”

We track this in code reviews. If someone submits AI code without evidence of follow-up inquiry, we send it back.

Sounds bureaucratic, but it works. After 2-3 months, engineers internalize the habit and start asking follow-ups naturally.

The Incentive Problem You Identified Is Real

You’re absolutely right that if we only measure velocity, Alex’s approach wins. That’s why we changed our metrics.

Old metrics (velocity-focused):

Story points completed per sprint
Number of features shipped

New metrics (comprehension-focused):

Percentage of production bugs fixed by original author within 48 hours
Code review cycles per PR (lower is better = code was well-understood before submission)
Time-to-independent debugging for new hires

These metrics incentivize understanding, not just output.

Addressing Team Resistance

I won’t sugarcoat it: some engineers hated this at first.

The feedback I got:

“This slows me down”
“Why do I need to explain code that already works?”
“This feels like busywork”

My response:
“You’re right, this is slower initially. But when your code breaks in production at 2am and you have to debug it, do you want to be the engineer who understands it, or the one who’s reading AI-generated code for the first time while customers are down?”

The seniors immediately got it. The juniors resisted until they experienced their first production incident. Then they got it too.

Can This Scale?

Michelle, you manage a much larger org than I do. One concern: does this approach scale to 80+ engineers?

The “Three Questions” rule doesn’t require senior oversight—it’s self-service. But the “explain it to a junior” code reviews do require reviewer capacity.

I’m curious whether you think this could work at your scale, or if it only works for smaller teams like mine (40 engineers).

vp_eng_keisha · March 18, 2026, 12:35am

Michelle, your Sam vs. Alex framing just clicked something for me about our “explain your AI prompt” culture at my EdTech company.

We started requiring prompt documentation 3 months ago, and the results mirror what you and Luis are describing—but we’re also hitting interesting scaling challenges at 80+ engineers.

What’s Working: Prompt + Reasoning Documentation

Similar to Luis’s “Three Questions” rule, we require every PR with AI-generated code to include:

In a comment at the top of the file:

// AI Assistance Log
// Prompt: [exact prompt used]
// Why this approach: [reasoning]
// Edge cases considered: [at least 3]
// Failure modes: [what could go wrong]
// Validation method: [how I verified this works]

Example from a recent PR:

// AI Assistance Log
// Prompt: "Write a function to calculate student grade percentiles"
// Why this approach: Needed O(n log n) performance for 10K+ students
// Edge cases: Empty array, single student, tied scores, null grades
// Failure modes: Could fail with non-numeric grades, very large datasets (memory), concurrent access
// Validation: Tested with 50K student dataset, verified results manually for small samples

The Surprising Benefit: Teaching Through Documentation

What I didn’t expect: the act of writing this documentation is where the learning happens.

We’ve had multiple engineers tell us: “I thought I understood the code, but when I tried to write the edge cases section, I realized I didn’t. So I went back to ask AI more questions.”

The documentation isn’t just showing understanding—it’s forcing understanding.

Luis mentioned engineers resisting initially. We had the same pushback. But here’s what changed minds:

We made the documentation searchable.

Now when engineers encounter similar problems, they search our codebase for AI Assistance Logs. They learn from others’ prompts, edge cases, and failure modes.

It’s become an internal knowledge base of “how to think about common problems,” not just “how to code common solutions.”

The Scaling Challenge: Code Review Bandwidth

Luis asked whether this scales to 80+ engineers. Short answer: barely.

The problem isn’t the documentation—that’s self-service. The problem is validating that the documentation is meaningful.

Some engineers write thoughtful edge case analysis. Others write:

Edge cases: “None”
Failure modes: “It shouldn’t fail”
Validation: “Tests pass”

That’s compliance theater, not learning.

To validate quality, we need experienced reviewers. But with 80 engineers and growing, our senior-to-junior ratio is getting unsustainable (currently 12:1, heading toward 15:1).

Two Experiments We’re Running

Experiment 1: Peer Review Circles (Inspired by Maya’s Design Critique Model)

We’re piloting structured peer review circles with 6-8 mid-level engineers and no senior:

Format (1 hour weekly):

Each engineer brings one PR with AI-generated code
Group reviews AI Assistance Logs together
Using a rubric, group identifies weak documentation
Author must defend their edge case analysis or acknowledge gaps
Group votes: “Ready to ship” or “Needs more investigation”

Early results (6 weeks in):

Mid-levels are catching edge cases we (seniors) would have caught
Engineers report learning more from peer review than from senior-led reviews
False positives (group approves bad code): ~15% (acceptable)
Time savings for seniors: ~30%

But we’re only 6 weeks in. Not sure if this quality level sustains.

Experiment 2: AI-Assisted Code Review (Meta Recursion)

This feels weird to say, but: we’re using AI to review AI-generated code documentation.

We built an internal tool that:

Reads the AI Assistance Log
Analyzes the code
Suggests edge cases the engineer might have missed
Flags when “validation method” seems insufficient

Example:

Code: Email validation regex
Engineer’s edge cases: “Empty string, null, whitespace”
AI reviewer suggests: “What about internationalized domains? Plus-addressing? Multiple @ signs?”
Engineer: “Oh, I didn’t think about that”

This doesn’t replace human review, but it augments it by scaling the “did you think about X?” questioning.

Ethical/philosophical concern: Are we creating a feedback loop where AI trains engineers to think like AI? Is that good or bad?

Addressing the Incentive Misalignment

Michelle, you nailed the root issue: if we measure velocity, we incentivize delegation mode. If we measure comprehension, we incentivize mentor mode.

I proposed new metrics to our CTO last month:

“First-time fix rate”: Percentage of bugs fixed correctly on first attempt (indicates understanding vs. guessing)
“Code explanation score”: Reviewer rates 1-5 how well the author explained their AI-generated code
“Edge case coverage”: Percentage of production bugs that were not in the engineer’s documented edge cases (lower is better)

The CTO’s response: “These are hard to measure and don’t show up in our Jira dashboards.”

That’s the real blocker. Our tooling and reporting infrastructure is built for velocity metrics. Comprehension metrics require new systems.

The Question I’m Wrestling With

Luis mentioned that after a production incident, engineers “get it” and start asking better questions.

But should we have to wait for production incidents to teach comprehension?

That feels like a terrible feedback loop—ship bad code, learn from failure, get better. But customers pay the price for that learning.

Can we create “safe” production-incident simulations as part of onboarding? Give juniors a staging environment with realistic bugs and say: “Debug this. Customers are impacted. Go.”

The urgency and consequence (even simulated) might accelerate the learning without actual customer harm.

Thoughts?

cto_michelle · March 18, 2026, 12:36am

The “AI reviewer reviewing AI-generated code” meta-recursion Keisha described is exactly where we need to go, but we need to be careful about the feedback loops we’re creating.

The Automation Paradox of AI Code Review

Here’s the uncomfortable pattern I’m seeing across the industry:

AI generates code that looks correct but has subtle issues
Juniors can’t spot the issues (debugging skills gap)
Solution: Use AI to review AI-generated code
Result: Juniors learn what AI thinks is important, not what production reality requires

The risk: We’re training a generation of engineers to satisfy AI reviewers, not to satisfy production systems.

But maybe that’s okay? I’m genuinely not sure.

What “Good” AI Usage Looks Like at the CTO Level

Michelle, your Sam vs. Alex framing works for individual contributors. But at the leadership level, I think the pattern is different.

Here’s what I’ve observed across my network of CTOs:

“Strategic AI Users” (the CTO equivalent of Sam):

Use AI to explore architectural options: “What are the tradeoffs between event-driven and request-response for this use case?”
Validate technical decisions: “Critique this API design for scalability issues”
Accelerate learning: “Explain the CAP theorem trade-offs in this scenario”
But never deploy AI-generated architectural decisions without deep understanding

“Delegation AI Users” (the CTO equivalent of Alex):

Use AI to generate slide decks for board meetings
Ask AI to “write a tech strategy” and present it with minimal editing
Rely on AI-generated metrics without interrogating them
Trust AI architectural recommendations without validating against their specific context

The gap at the leadership level is more dangerous because the blast radius is larger. A junior shipping buggy code affects one feature. A CTO making AI-driven architectural decisions without deep understanding affects the entire technical foundation.

Luis’s “Three Questions” Rule Scales (With Tooling)

Luis asked if the structured approach scales to 80+ engineers. My answer: yes, but only if you build tooling to enforce it.

At my company (120 engineers), we can’t manually review every “Three Questions” response. So we built automation:

Pre-merge checks:

Scan PR for AI Assistance Logs
Parse each log for required fields (prompt, edge cases, failure modes, validation)
Block merge if any field is “None” or “N/A” or < 10 characters
Use GPT-4 to evaluate if edge cases are realistic or generic (“edge case: empty string” is too vague)
Flag PRs with low-quality logs for human review

Is this perfect? No. Engineers can game it by writing longer but still meaningless responses.

Does it work well enough? Yes. Compliance went from 30% to 85%. And even generic edge case thinking is better than no edge case thinking.

The Measurement Problem Keisha Identified Is Everything

Keisha’s CTO said: “These metrics don’t show up in our Jira dashboards.”

This is the existential problem. We’re managing engineering quality with dashboards built for a pre-AI era.

Our current tools measure:

Velocity (story points, commits, PRs merged)
Activity (lines of code, tickets closed)

We need new tools that measure:

Comprehension (can engineers explain their code?)
Resilience (how often does code break in production?)
Learning (are engineers getting better at debugging over time?)

I’m working with a few vendors to build this. But it’s early and messy. No clean solutions yet.

Maya’s Design Critique Model + Engineering

I keep coming back to what Maya said in earlier threads about design critique culture. We need an engineering critique culture that’s separate from code review.

Code review asks: Is this code correct?
Engineering critique asks: Do you understand this code deeply enough to own it in production?

What if we had weekly “code critique” sessions (not code review—critique) where:

Engineers present code they wrote (with or without AI)
Team asks probing questions (not about bugs, but about understanding)
“Can you explain why this approach?” “What happens if X?” “How would you debug Y?”
Goal: Foster “mentor mode” thinking through peer accountability

This might be the culture shift we need. But it requires time, and time is the resource engineering leaders don’t have.

The Simulation Idea Is Brilliant

Keisha’s suggestion about simulated production incidents during onboarding is exactly right.

We actually piloted this:

“Bug Hunt Week” for all new engineers (week 3 of onboarding):

Give them a staging environment with 10 intentionally planted bugs (varying severity)
Each bug was from a real production incident (sanitized)
Simulated Slack channel with “customer complaints” and “support escalations”
They have 3 days to find and fix as many as possible
Debrief session where we reveal the root causes and discuss debugging approaches

Results:

Engineers loved it (surprisingly)—felt like a realistic challenge, not busywork
Average debugging competency at 90 days post-hire improved significantly
Engineers who went through Bug Hunt Week resolved production incidents 40% faster than those who didn’t

The catch: Building realistic bug simulations requires senior time upfront. We spent ~80 senior hours building the 10-bug environment. But it’s reusable, so the marginal cost drops to near-zero over time.

The Strategic Question for All of Us

Here’s what keeps me up at night as a CTO:

If AI continues to improve at code generation, and our engineers become dependent on it, what happens when AI makes a category of mistake AI reviewers can’t catch?

We’ll have a generation of engineers who can’t debug it because they never built the foundational skills. And our entire industry will have a single point of failure.

That’s not hypothetical. We’ve already seen this with security vulnerabilities in AI-generated code. The CodeRabbit study found AI code creates 1.7x more issues than human code.

If our engineers can’t audit AI code critically, we’re building systemic technical debt that will explode when AI fails in new ways.

That’s why “mentor mode” vs. “delegation mode” isn’t just about individual skill development. It’s about industry-wide resilience.

Thoughts?

maya_builds · March 18, 2026, 12:37am

Design perspective here, and I think Michelle, Luis, Keisha, and CTO Michelle are all circling around the same core insight but from different angles.

Let me connect the dots.

It’s Not About AI. It’s About Intentionality.

In design, we have a concept called “tool mastery vs. tool dependency.”

Tool mastery: Using Figma AI to rapidly prototype 10 layout variations, then critically evaluating which one actually serves user needs
Tool dependency: Using Figma AI to generate a layout, seeing it looks nice, and shipping it without understanding why it works (or doesn’t)

The tool is the same. The outcome is radically different.

The difference isn’t the tool. It’s whether the practitioner is using the tool to think or using the tool instead of thinking.

This maps exactly to Michelle’s “mentor mode” (using AI to think) vs. “delegation mode” (using AI instead of thinking).

Why This Matters for Non-Engineers Too

I’m not an engineer. I’m a designer who codes enough to be dangerous. But the AI comprehension gap affects product, design, and cross-functional collaboration just as much as it affects engineering.

Example from last month:

Our product manager (shoutout to David) asked engineering: “Can we add real-time notifications?”

Engineer in delegation mode: “Yeah, AI generated a WebSocket implementation. It works.”
PM: “Great! How does it handle reconnections if the user’s connection drops?”
Engineer: “Uh… I’m not sure. Let me check the code AI wrote.”

The engineer shipped code they couldn’t explain. Product couldn’t assess feasibility trade-offs. Design didn’t know what edge cases to design for. The feature launched and immediately broke for users on flaky mobile networks.

If the engineer had been in mentor mode:

Engineer: “AI suggested WebSockets, but I asked about reconnection strategies. Turns out we need exponential backoff and state reconciliation. That adds 2 days of work, but without it, mobile users will have a terrible experience.”
PM: “Okay, let’s scope that correctly or push back the launch.”
Design: “I’ll design the reconnection UI state.”

The cross-functional collaboration requires engineers who understand their code deeply enough to explain trade-offs to non-technical stakeholders.

The “Explain It to a Designer” Test

Luis mentioned “explain it to a junior” as a code review practice. I want to propose a variation: “Explain it to a non-engineer.”

If an engineer can’t explain their AI-generated implementation to me (a designer) or to David (a PM) in terms we understand, they probably don’t understand it deeply enough to own it in production.

This isn’t about dumbing down technical concepts. It’s about forcing engineers to understand the purpose and trade-offs, not just the syntax.

What this looks like in practice:

In design critique, we have a rule: “Explain your design decisions in user-impact terms, not tool terms.”

Bad: “I used a grid system with 12 columns”
Good: “I used a grid to make the layout scannable so users can find information quickly”

Engineering equivalent:

Bad: “I used WebSockets for real-time updates”
Good: “I used WebSockets because they provide instant updates with low latency, but they require active connections, so users on flaky networks might see reconnection delays”

The second answer shows understanding of trade-offs, not just tool usage.

Keisha’s AI Reviewer Reviewing AI Code: A Cautionary Tale

Keisha’s concern about “training engineers to satisfy AI reviewers” is exactly what happened in design when we introduced AI critique tools.

We built an internal tool that used GPT-4 to review designs for accessibility issues. It worked great initially—caught color contrast problems, missing alt text, etc.

But after 6 months, we noticed designers were optimizing for the AI feedback, not for user needs.

Example:

AI flagged “button text too small”
Designer increased font size to pass AI check
But now the button didn’t fit in the mobile layout, so they shortened the text to “OK”
AI passed it, but users were confused because “OK” was ambiguous in context

The AI taught designers to satisfy the tool, not to think about users.

The fix: We changed the AI reviewer from “does this pass checks?” to “here are questions to consider.”

Instead of:

“Button text contrast ratio: FAIL”

We output:

“Question: Will users with low vision be able to read this button text in bright sunlight?”

This forced designers to think about the user impact, not just fix the automated complaint.

Engineering equivalent:

Luis’s “Three Questions” rule is brilliant because it forces questions, not just compliance.

Keisha’s AI reviewer should probably do the same—not “Edge case coverage: LOW” but “Question: What happens to this code if the API returns null? Have you tested that?”

The Cultural Shift: Celebrating “I Don’t Know”

Here’s the hardest part of shifting from delegation mode to mentor mode: you have to make it safe to say “I don’t know.”

Right now, if an engineer says “I used AI and I’m not sure how it works,” that’s perceived as a weakness. So engineers hide it. They ship code they don’t understand and hope it works.

But if the culture is: “I don’t know yet, but I’m going to find out” is celebrated as intellectual honesty, then engineers will ask more questions before shipping.

In design, we explicitly celebrate “I tried three approaches, and I’m still not sure which is best—here’s why I’m uncertain.” That uncertainty drives deeper critique and better outcomes.

Can engineering do the same? Can we make “I don’t understand this AI-generated code yet, so I’m going back to learn more” a positive signal, not a negative one?

Michelle’s Strategic Question: Industry-Wide Resilience

Michelle (CTO) asked about industry-wide resilience if AI creates a category of mistakes AI can’t catch.

This already happened in design.

In 2023, a bunch of AI-generated design systems hit the market. They looked great in demos. But they broke in production because they didn’t account for real-world constraints (performance, internationalization, accessibility for edge cases).

Designers who relied entirely on AI tools couldn’t debug why their designs broke in production because they’d never learned the fundamentals of how browsers render layouts under stress.

The ones who survived weren’t the ones who avoided AI. They were the ones who used AI as a learning tool while maintaining deep foundational knowledge.

That’s the path for engineering too. Not “avoid AI” but “use AI in mentor mode while preserving debugging fundamentals.”

Long comment, but this thread is maybe the most important technical leadership discussion I’ve seen this year.