Satya Nadella recently revealed that 20-30% of code in Microsoft’s repositories is now “written by software.” Sundar Pichai shared similar numbers for Google — over 25-30% of new code is AI-generated. These aren’t small experiments. This is production code at two of the world’s largest tech companies.
As a CTO, this raises questions I don’t have good answers to yet.
The Headline Numbers
Microsoft (April 2025):
- 20-30% of code in repositories is AI-generated
- Nadella says “some projects may have all of its code written by AI”
Google (2025):
- 25%+ of code is AI-assisted
- Pichai emphasizes velocity gains (+10% speed), not replacement
Industry-wide:
- 41% of all code is now AI-generated
- 76% of developers using (62%) or planning to use (14%) AI coding tools
- 75% still manually review every AI-generated snippet before merging
The Ownership Questions I’m Wrestling With
1. Who owns AI-generated code?
The legal framework is murky. Generally:
- If a human provides “sufficient creative input” (iterative prompting, editing, refining), copyright may attach to the human author
- Through employment agreements, that ownership typically transfers to the employer
- But what if the “creative input” is just “write a function that does X”?
Microsoft offers IP indemnity for Copilot outputs (if guardrails are enabled). That’s a meaningful commitment. But not all AI tools offer this protection.
2. What about license contamination?
Research suggests ~35% of AI-generated code samples contain licensing irregularities. This is a real liability risk. If an AI tool was trained on GPL code and reproduces patterns from that code, are you now obligated to open-source your project?
Microsoft and Google use sophisticated license detection tools. Most companies don’t have that infrastructure.
3. The “review” problem
75% of developers say they manually review AI-generated code. But do they? Really?
I’ve watched developers accept AI suggestions with barely a glance. The review is increasingly cursory as AI output quality improves. This creates a gap between stated practice and actual practice.
4. Attribution and credit
If 30% of your codebase is AI-generated, how do you:
- Evaluate developer performance?
- Attribute bugs to authors?
- Assess code quality ownership?
- Handle code reviews?
What I’m Doing at My Company
- Explicit AI code policies - We require annotation when AI generates substantial portions of code
- License scanning - We run automated tools to detect potential license contamination
- Enterprise tiers only - We use tools with IP indemnification clauses
- Review standards - AI-generated code gets the same review standards as human code
Questions for the Community
- How are you handling attribution for AI-generated code?
- Has anyone actually experienced a license contamination issue?
- Do your code review practices change for AI-generated PRs?
This feels like we’re building on uncertain legal foundations. The technology has moved faster than the governance.
Michelle, this is exactly the kind of discussion we need to be having. The IP and license issues are real, but I want to add the security dimension to this conversation.
The audit trail problem:
When 30% of your code is AI-generated, your security audit process breaks down. Traditional security reviews assume:
- Developers understand the code they write
- There’s institutional knowledge about why code exists
- You can interview the author about edge cases
With AI-generated code:
- The “author” is an LLM that can’t be interviewed
- The prompt history may not be preserved
- The developer who accepted the code may not fully understand it
Vulnerability introduction at scale:
Research shows 45% of AI-generated code contains vulnerabilities. If 30% of your codebase is AI-generated, you’re looking at roughly 13.5% of your entire codebase having potential security issues.
That’s not a rounding error. That’s a significant attack surface.
What I’m seeing in security assessments:
-
Pattern reproduction - AI tools reproduce common security anti-patterns from their training data (SQL string concatenation, eval() usage, etc.)
-
Context blindness - AI doesn’t understand your threat model. It might generate perfectly functional code that’s completely inappropriate for your security requirements.
-
Review fatigue - Security reviewers are overwhelmed. When AI generates more code faster, security can’t keep up.
My recommendations:
- Static analysis is mandatory - Run SAST tools on all AI-generated code before merge
- Security-focused prompting - Train developers to include security requirements in prompts
- Separate review tracks - AI-generated code should get security review, not just functional review
- Preserve context - Save prompts and AI responses for later audit
The legal ownership question is important, but I’m more worried about the security ownership question. If a vulnerability in AI-generated code leads to a breach, who’s accountable?
I want to push back gently on the framing here. As someone who uses AI tools daily, I think we’re overcomplicating the ownership question.
The practical reality:
When I use AI to generate code, I’m:
- Defining the requirements
- Choosing which output to use
- Modifying it to fit my context
- Testing and validating it works
- Taking responsibility for the result
This is fundamentally the same as using Stack Overflow, except faster. We didn’t have existential crises about “who owns code copied from Stack Overflow.”
The “30% AI-generated” stat is misleading:
That 30% probably includes:
- Boilerplate and scaffolding
- Test case generation
- Documentation strings
- Simple utility functions
The novel, creative, business-logic code is still largely human-written. The AI is handling the boring stuff we would have copy-pasted anyway.
On the review question:
@cto_michelle asked if developers actually review AI code. In my experience:
- For autocomplete suggestions (single lines): No, not really
- For larger code blocks: Yes, pretty thoroughly
- For architecture suggestions: Definitely
The review depth matches the risk level. That’s… rational?
What I think matters more:
-
Code quality, not code origin - Does it work? Is it maintainable? Is it secure? Those questions matter regardless of who/what wrote it.
-
Developer responsibility - If I merge it, I own it. Full stop. This has always been true for any code that enters the codebase.
-
Tooling over policy - Better static analysis, better CI/CD gates, better test coverage. These protect you better than tracking AI attribution.
My honest answer to your questions:
- Attribution: I don’t track it. The PR author is responsible for everything in their PR.
- License contamination: Haven’t experienced it. Running license scanners would catch obvious issues.
- Review practices: Same standards. Code is code.
Maybe I’m being naive, but I think we’re creating process overhead for a problem that existing engineering practices already solve.
Both Michelle and Alex make valid points. Let me offer the middle-management perspective — I’m caught between policy and practicality daily.
The attribution problem is real, but not for the reasons you think:
I don’t care about AI attribution for legal reasons. I care about it for performance management reasons.
When I look at commit history to understand who contributed what:
- Did this engineer solve the problem or did they prompt well?
- Is this person growing technically or just becoming a better AI whisperer?
- When this code breaks, who actually understands it well enough to fix it?
These questions matter for career development, team composition, and incident response.
The “code is code” argument has limits:
@alex_dev makes a fair point that the developer takes responsibility. But there’s a spectrum:
High confidence code:
- Engineer wrote it from scratch
- Engineer modified AI output significantly
- Engineer can explain every line
Low confidence code:
- AI generated with minimal review
- Engineer accepted because “it works”
- Engineer might not understand edge cases
The codebase doesn’t distinguish between these. That’s a problem when you need to modify it later.
What I’m experimenting with:
-
Commit message conventions - We’re testing a convention where AI-generated code includes “[AI-assisted]” in commit messages. Not enforcement, just visibility.
-
Design review separation - We now require human-written design docs before AI implementation. The human does the thinking; AI does the typing.
-
Ownership rotation - Engineers rotate through code areas they didn’t write (AI or human). This forces knowledge transfer.
My honest assessment:
The 30% number will be 50% in a year and 70% in three years. We need to figure this out now, while the percentage is still manageable. Waiting until the majority of code is AI-generated to establish norms will be too late.
@cto_michelle - Your annotation policy is exactly right. The cost of tracking is low; the cost of not tracking could be significant later.