Last week, our CFO asked me a simple question: “What’s our AI code percentage?”
I realized I had no idea what that number meant, let alone how to calculate it. And the more I dug into it, the more I realized this is a measurement methodology problem that’s obscuring what’s actually happening in engineering organizations.
The Numbers Don’t Add Up—Because We’re Measuring Different Things
Look at the range of reported AI code percentages:
- 41% - Industry surveys of all developers
- 25% - Google’s public figure (Sundar Pichai)
- 46% - Individual active developers report
- 70-90% - Anthropic reports company-wide
- 26.9% - Measured in production code (Nov 2025 - Feb 2026 study)
These can’t all be describing the same reality. They’re measuring different things:
Google’s 25% likely means: “AI-generated code that passes review and ships to production”
Industry 41% likely means: “Code where AI provided assistance at any stage”
Anthropic’s 70-90% likely means: “Initial code drafts that used AI tools” (for an AI company where eating your own dogfood is cultural)
Individual 46% likely means: “Self-reported perception of AI contribution”
The Attribution Problem
Here’s where it gets messy. If an AI suggests a function and I:
- Accept it unchanged → 100% AI code?
- Modify 20% → 80% AI code?
- Use it as inspiration and rewrite → 30% AI code? 0%?
- Reject it and write manually → 0% AI code?
There’s no standard. Some tools measure keystrokes accepted vs. rejected. Some measure “AI suggestions used.” Some rely on developer self-reporting (notoriously unreliable).
At my fintech startup, we tried three measurement approaches and got three wildly different numbers:
- Developer survey: “~40% of my code uses AI”
- GitHub Copilot acceptance metrics: 28% of suggestions accepted
- Manual code review sampling: ~15% of shipped code traced to AI origin
Which number should I report to the board?
Why This Matters for Business Outcomes
As a product leader, I need metrics that connect to outcomes, not just activity. The problem with “AI code percentage” is that it’s an input metric pretending to be an outcome metric.
What we actually care about:
- Are we delivering customer value faster?
- Is product quality improving?
- Are we building the right things more efficiently?
- Is our engineering investment producing ROI?
None of those questions are answered by “X% of code is AI-generated.”
A Framework Proposal
I’m proposing we categorize metrics into three layers:
Layer 1: Input Metrics (Activity)
- AI tool adoption rate
- AI code percentage (however measured)
- Developer self-reported productivity
Useful for: Understanding tool usage and team practices
Not useful for: Measuring business impact
Layer 2: Throughput Metrics (Velocity)
- Feature cycle time (idea → deployed)
- Code review time
- Deployment frequency
- Lead time for changes
Useful for: Understanding engineering process efficiency
Sometimes useful for: Correlating to business outcomes
Layer 3: Outcome Metrics (Business Value)
- Customer feature adoption rate
- Revenue impact per feature
- Customer satisfaction / NPS
- Bug rate / security incidents
- Technical quality (reliability, performance)
Useful for: Actually measuring whether AI is helping the business
What executives care about: This layer only
What We’re Tracking Instead
At our Series B fintech, we stopped tracking AI code percentage entirely. Instead, we track:
Throughput:
- Time from product brief → feature in production
- Number of customer-facing releases per month
- Engineering cycle time by feature type
Outcomes:
- Feature adoption (% of customers using new features within 30 days)
- Customer-reported quality issues (bugs, performance, UX)
- Revenue attributed to new features (when measurable)
- Engineering team retention and satisfaction
Interestingly, none of these metrics improved proportionally to our AI tool adoption. Some improved marginally (8-12%), some stayed flat, and some got worse (bug rate increased).
That tells me something very different than “40% AI code adoption” would suggest.
The Question for This Community
If you’re measuring AI code percentage, what methodology are you using?
And more importantly: What outcome metrics are you tracking to understand whether AI is actually helping your business?
I suspect we’re all measuring different things, calling them the same name, and drawing incomparable conclusions. We need a shared framework for what success actually looks like.
What are you tracking, and why did you choose those metrics?