I spent the last 3 months researching how companies actually measure AI impact on their engineering teams.
Talked to engineering leaders at 18 companies (3 FAANG, 8 high-growth startups, 7 mid-size tech companies). Analyzed their approaches, their wins, their failures.
Here’s what actually works.
The Three-Layer Framework
Every company that successfully measures AI uses some version of this:
Layer 1 - Adoption Metrics
Are people actually using the tools?
Layer 2 - Impact Metrics
Is usage translating to speed, quality, or satisfaction improvements?
Layer 3 - Outcome Metrics
Are engineering improvements driving business value?
Most companies only measure Layer 1. The ones seeing ROI measure all three.
Case Studies (Anonymized)
Company A: E-commerce unicorn, 200 engineers
What they tracked:
- AI-assisted PR ratio: 65% of PRs touch AI-generated code
- Developer NPS: 8.2/10 for AI tools
- Cycle time: Flat (no improvement)
The surprise: High usage, good satisfaction, but zero velocity impact.
Root cause analysis found:
- Engineers writing code 40% faster
- But spending 60% more time in code review (scrutinizing AI output)
- Net effect: Neutral
Decision: Kept tools (retention/morale value) but adjusted expectations.
Company B: Fintech startup, 80 engineers
What they did differently:
- A/B tested teams: 50% with AI, 50% control group
- Tracked both groups for 6 months
Results:
- AI group: 12% faster cycle time
- AI group: 8% higher defect escape rate
- AI group: 23% higher reported satisfaction
Decision: Rolled out to all teams but implemented stricter code review process for AI-generated code.
Company C: SaaS company, 120 engineers
Focused purely on retention:
- Before AI tools: 18% annual attrition
- After AI tools: 11% annual attrition
- Avoided ~8 engineer replacements
ROI calculation:
- Tool cost: 60K/year
- Replacement cost avoided: ~.8M (8 × 25K)
- ROI: 400%
Decision: Justified entirely on retention, not productivity.
Common Mistakes I Saw
1. Measuring Too Early
Several companies rolled out AI tools and started measuring immediately. Problem: No baseline data.
You need 3-6 months of pre-AI metrics to establish baselines. Otherwise, you can’t separate AI impact from other changes.
2. Metric Overload
One company tracked 23 different metrics. Nobody could make sense of the data.
Better: 3-5 key metrics you actually review regularly.
3. Ignoring Confounding Variables
Example: Company saw 25% velocity increase after AI rollout.
But they also:
- Hired 4 senior engineers
- Simplified deployment pipeline
- Moved from complex new features to maintenance work
Which drove the improvement? Impossible to say without controls.
4. Vanity Metrics
Lots of companies track:
- Lines of AI-generated code
- AI tool usage hours
- PR counts
These are activities, not outcomes. They look impressive but don’t indicate value.
What Actually Works
Start Simple
Month 1-3:
- Utilization rate (% of engineers using tools daily)
- Developer NPS (one question: “How valuable are AI tools?”)
That’s it. Two metrics.
Add Complexity Slowly
Month 4-6:
- Add cycle time tracking
- Add quality metrics (defect rate, incident frequency)
Control for Confounders
Year 1-2:
- Cohort analysis (early adopters vs late adopters)
- Regression models controlling for team size, seniority, project complexity
- Quasi-experimental designs if randomization isn’t feasible
Combine Quant + Qual
Numbers tell you what happened. Qualitative feedback tells you why.
Best practice:
- Monthly surveys with closed-ended questions (quantitative)
- Quarterly team interviews (qualitative)
- Annual retrospective (strategic)
Key Insight: The 20% Who Measure Win
Of the 18 companies I studied:
- 4 measured rigorously (all three layers)
- 7 measured basics (Layer 1 + some Layer 2)
- 7 barely measured (usage tracking only)
Results:
Rigorous measurers:
- All reported positive ROI (range: 80-500%)
- All plan to expand AI investment
- All can defend budgets to board
Basic measurers:
- Mixed results
- Some positive, some “we think it helps”
- Vulnerable to budget cuts
Minimal measurers:
- Can’t prove value
- Several considering cutting tools
- Retention risk if they do
The DX AI Measurement Framework
One framework I saw multiple companies use successfully:
Utilization:
- Daily active users
- Feature adoption rates
- Cost per active user
Impact:
- Time savings (self-reported)
- Satisfaction scores
- Productivity proxies (cycle time, throughput)
Cost:
- Total spend (tools + implementation + support)
- ROI calculation
- Comparison to alternatives
My Recommendations
Week 1: Pick 1-2 metrics and start tracking
- Start simple: Utilization + NPS
- Don’t overthink it
Month 3: Review data and add 1-2 more metrics
- Add cycle time or quality metrics
- Look for trends
Month 6: First ROI estimate
- Rough calculation with conservative assumptions
- Sensitivity analysis (what if impact is half what we think?)
Year 1: Comprehensive review
- All three layers
- Go/no-go decision based on data
- Adjust strategy based on learnings
The Bottom Line
Organizations that measure AI impact see better results. Not because measurement magically improves AI tools, but because:
- Measurement forces clear thinking about what success looks like
- Data enables optimization (cut what doesn’t work, double down on what does)
- Accountability drives better adoption practices
- Evidence makes budgets defensible
Only 20% of teams measure AI impact. Those 20% are winning.
Join them.
What I’d Love to Hear
For those already measuring:
- What’s working for you?
- What metrics matter most?
- How do you handle attribution challenges?
For those not measuring yet:
- What’s blocking you?
- What would “good enough” measurement look like?
- How can we help you get started?