Redefining Developer Productivity for the AI Era: From Activity Tracking to Value Creation

This has been an incredibly valuable discussion across multiple threads. I want to synthesize what we’ve learned and propose a path forward for measuring and achieving real AI productivity.

Context:

The Core Insight

Old productivity paradigm:

  • Individual developer velocity = organizational productivity
  • Measure: Lines of code, commits, story points
  • Assumption: Faster coding → faster delivery

AI era reality:

  • Individual velocity ≠ organizational productivity
  • Measure: Value delivered to customers per unit of time/investment
  • Truth: Faster coding only helps if the rest of the system can absorb it

We need a new framework for the AI era.

The Multi-Dimensional Productivity Framework

Based on this discussion, I’m proposing we measure AI productivity across five dimensions:

Dimension 1: Adoption (Are we using it?)

Metrics:

  • % developers actively using AI tools
  • AI-assisted commits as % of total commits
  • Developer satisfaction with AI tools
  • Feature usage rates across different AI capabilities

Purpose: Understand whether tools are adopted and identify adoption blockers

Warning: High adoption doesn’t mean high value

Dimension 2: Process Impact (Is work moving differently?)

Metrics:

  • End-to-end cycle time (idea → production) by work type
  • DORA metrics (deployment frequency, lead time, change failure rate, MTTR)
  • Bottleneck identification (where does work queue: review, testing, deployment?)
  • Rework rates (how often does AI code need significant revision?)

Purpose: Understand how AI changes workflow and where new bottlenecks appear

Key insight from discussion: AI speeds up coding (5-10% of pipeline), exposes bottlenecks elsewhere (90-95%)

Dimension 3: Quality & Sustainability (Are we building capability or debt?)

Metrics:

  • Defect rates (production bugs, security vulnerabilities, accessibility failures)
  • Technical debt accumulation vs reduction
  • Code quality trends (complexity, maintainability)
  • Developer learning and growth
  • Team retention and satisfaction

Purpose: Ensure we’re not trading long-term sustainability for short-term gains

Key insight from discussion: AI can accelerate technical debt accumulation; governance is mandatory

Dimension 4: Business Outcomes (Are we delivering more value?)

Metrics:

  • Customer-facing features delivered per quarter (not story points)
  • Time from customer request to production solution
  • Business KPI movement (revenue, activation, retention) per engineering sprint
  • Customer satisfaction and NPS
  • Customer-impacting defects per release

Purpose: Connect engineering activity to business results

Key insight from discussion: This is hardest to measure but most important for executive buy-in

Dimension 5: Cost Efficiency (Is the ROI positive?)

Metrics:

  • AI tooling costs per engineer per month
  • Productivity gain (measured in Dimensions 2-4) per dollar invested
  • Process improvement costs required to capitalize on AI
  • Support and incident costs related to AI-generated code

Purpose: Ensure investment generates positive return

Key insight from discussion: AI tools are cheap; process changes needed to capitalize on them are expensive

The Implementation Reality

Luis shared his three-layer framework. Keisha added organizational health. David emphasized business outcomes.

Combining these: You can’t measure just one dimension and claim productivity.

Bad measurement:

  • Track only Dimension 1 (adoption) → “93% of developers use AI! Success!”
  • Track only commits/PRs → “Velocity up 34%! Success!”

Neither tells you if you’re creating business value.

Good measurement:

  • Track all 5 dimensions
  • Look for correlation: Does adoption → process improvement → quality maintenance → business outcomes?
  • Accept that not all dimensions will improve simultaneously

The Change Management Challenge

From our discussions, it’s clear: AI productivity requires organizational change, not just tool adoption.

What needs to change:

1. Processes (Maya’s bottleneck point)

  • Code review for AI era (different patterns, higher volume)
  • Testing infrastructure scaling
  • CI/CD optimization for increased throughput
  • Quality gates adapted for AI-generated code

2. Governance (Keisha’s quality framework)

  • Automated quality gates (security, accessibility, design systems)
  • AI-specific review practices
  • Risk-based AI usage policies
  • Developer accountability culture

3. Skills (Luis’s training emphasis)

  • How to write effective AI prompts
  • How to review AI-generated code
  • When to use vs avoid AI
  • How to maintain quality at AI velocity

4. Measurement (my original question)

  • Shift from activity metrics to outcome metrics
  • Multi-dimensional productivity tracking
  • Connect engineering work to business KPIs

Technology is 20% of the solution. Process, governance, skills, and measurement are 80%.

The Pragmatic Path Forward

Phase 1: Establish baselines (Month 1)

  • Start measuring all 5 dimensions now
  • Don’t wait for perfect measurement
  • Use cohort comparison (AI heavy vs light users) if no historical baseline

Phase 2: Implement governance (Months 2-3)

  • Automated quality gates (security, accessibility, design compliance)
  • AI-specific review checklists
  • Risk-based AI usage policies (like Luis’s red/yellow/green zones)

Phase 3: Optimize processes (Months 3-6)

  • Address bottlenecks AI exposes (review, testing, deployment)
  • Scale infrastructure to match increased throughput
  • Shift testing and quality left

Phase 4: Cultural shift (Months 4-9)

  • Training on AI-native workflows
  • Redefine what “productivity” means
  • Celebrate outcomes, not activity

Phase 5: Iterate (Ongoing)

  • Review metrics quarterly
  • Adjust governance based on what’s working
  • Continuously optimize

The Hard Truth About Sustainable Productivity

David asked if anyone has achieved high velocity AND high quality with AI.

My hypothesis: Not yet—because most organizations are still in the “tool adoption” phase.

They’re at Phase 0:

  • Buy AI tools
  • Give to developers
  • Measure adoption
  • Declare success

They haven’t done the hard work:

  • Process redesign
  • Governance implementation
  • Cultural transformation
  • Outcome-based measurement

Real productivity requires all of it.

The Vision: AI Enabling Higher-Value Work

The goal isn’t “write more code faster.”

The goal is: AI handles routine work, developers focus on high-value work.

  • AI writes boilerplate → developers design architecture
  • AI generates tests → developers design test strategies
  • AI refactors code → developers solve customer problems
  • AI handles easy work → developers tackle hard, innovative work

Productivity means solving harder problems, not solving easy problems faster.

If we’re using AI to cram more tickets into the feature factory, we’re optimizing for the wrong thing.

If we’re using AI to create space for innovation, strategic thinking, and customer understanding, we’re on the right path.

Your Experiences

This framework is a synthesis of our discussions, but it’s still theoretical.

I want to hear:

  1. What parts of this framework resonate with your reality?
  2. What’s missing or wrong?
  3. Has anyone progressed beyond Phase 2 (governance) to Phase 3+ (process optimization and cultural shift)?
  4. What metrics have you found that actually predict business success?

The goal: Create a community-validated playbook for AI productivity that actually works.

Not productivity theater. Not vanity metrics. Real, sustainable, value-creating productivity.

Let’s figure this out together.

Michelle, this synthesis is exactly what I needed to see. Your five-dimensional framework captures everything we’ve been discussing.

The Phase Reality Check

Your phase model is spot-on—and it highlights where most organizations are stuck.

Where I see most orgs:

  • Phase 0: 70% of companies (tool adoption only)
  • Phase 1: 20% (starting to measure)
  • Phase 2: 8% (implementing governance)
  • Phase 3+: 2% (process optimization and cultural shift)

We’re one of the 8% in Phase 2, trying to get to Phase 3.

What We’re Learning in Phase 2

Governance implementation is HARD culturally.

The technical parts are straightforward:

  • Set up automated security scanning :white_check_mark:
  • Configure accessibility linters :white_check_mark:
  • Create AI-specific review checklists :white_check_mark:

The cultural parts are brutal:

  1. Developer resistance

    • “AI makes me faster, but governance slows me down”
    • “These quality gates are blocking my ‘productivity’”
    • “We’re adding process overhead that defeats the point of AI”
  2. Management confusion

    • “We invested in AI for speed, why are we slower now?”
    • “Other companies are shipping faster with AI, why aren’t we?”
    • “Is all this governance really necessary?”
  3. Measurement challenges

    • “How do we know if governance is working vs just creating friction?”
    • “What’s the right balance between speed and safety?”

The answer to all of these: You can’t have sustainable productivity without governance.

But convincing people of this when they’re excited about AI velocity is HARD.

The Organizational Health Dimension Is Critical

Michelle, I want to emphasize Dimension 3 (Quality & Sustainability) because I think this is where most orgs will fail.

What we’re tracking:

Team health metrics:

  • Developer satisfaction (quarterly surveys)
  • Learning and growth (are people developing new skills, or just relying on AI?)
  • Retention rates by team
  • Promotion velocity (are people still growing in capability?)

Knowledge health metrics:

  • Documentation quality and coverage
  • Architectural decision records
  • Knowledge sharing (tech talks, design reviews)
  • Mentorship engagement

Innovation health metrics:

  • Time spent on exploratory work vs feature factory
  • Technical proposals submitted
  • R&D initiatives launched
  • Cross-team collaboration

Early signals I’m seeing:

:cross_mark: Junior developers are not learning as fast

  • They’re using AI as a crutch instead of developing fundamentals
  • When AI produces broken code, they can’t debug it
  • Career development is slowing because they’re not building core skills

:cross_mark: Senior developers are frustrated

  • Code review feels like “checking AI’s homework” instead of mentoring
  • Less time for architectural work and strategic thinking
  • Feeling like “AI babysitters” instead of technical leaders

:warning: Team collaboration is changing

  • Less pair programming (AI is the “pair”)
  • Fewer design discussions (AI generates a solution, why discuss?)
  • Knowledge silos forming (everyone’s AI journey is individual)

These are Dimension 3 problems that won’t show up in Dimension 2 (process) or Dimension 4 (outcomes) until it’s too late.

The Vision Question

Your point about “AI enabling higher-value work” is THE critical question.

What we’re trying to optimize for:

Instead of measuring “can developers code faster?” we’re asking:

  1. Are developers solving harder problems?

    • More complex features
    • Better architectural solutions
    • More innovative approaches
  2. Are developers spending time on high-value activities?

    • Customer research and user understanding
    • System design and architecture
    • Mentoring and knowledge sharing
    • Technical innovation and experimentation
  3. Are developers growing in capability?

    • New skills developed
    • Deeper expertise
    • Leadership opportunities

If AI just makes us ship more tickets faster without changing WHAT we work on, we’ve failed.

The Balanced Scorecard Implementation

I’m proposing we implement David’s “Productivity = Velocity × Quality” idea with your five dimensions:

Quarterly Productivity Scorecard:

Dimension 1 (Adoption): 85% (good)
Dimension 2 (Process): 68% (needs work - bottlenecks in testing)
Dimension 3 (Quality/Sustainability): 71% (concerns about junior dev learning)
Dimension 4 (Outcomes): 64% (modest business impact so far)
Dimension 5 (Cost Efficiency): 78% (positive ROI, but not as high as expected)

Overall Productivity Score: 73% (weighted average)

This tells a much more honest story than “AI productivity up 45%!”

The Cultural Shift Playbook

For Phase 4 (cultural shift), here’s what we’re trying:

1. Redefine what we celebrate

  • :cross_mark: Stop celebrating: “I merged 15 PRs this week!”
  • :white_check_mark: Start celebrating: “I shipped a feature that improved activation by 8%”

2. Change standup questions

  • :cross_mark: Old: “What did you ship yesterday?”
  • :white_check_mark: New: “What value did you create? What did you learn?”

3. Adjust performance reviews

  • :cross_mark: Old criteria: Story points completed, commits merged
  • :white_check_mark: New criteria: Business impact, code quality, team collaboration, learning

4. Create AI usage norms

  • Green/yellow/red zones (like Luis suggested)
  • Public examples of “good AI usage” vs “problematic AI usage”
  • Regular retrospectives on AI effectiveness

This is HARD because it requires changing habits and incentives.

The Community Playbook Idea

Michelle’s call for a “community-validated playbook” is exactly what’s needed.

What I’d love to see:

  1. Phase-specific guidance

    • Detailed playbooks for each phase (0-5)
    • Common pitfalls and how to avoid them
    • Success criteria for moving to next phase
  2. Industry-specific considerations

    • Regulated industries (fintech, healthcare) vs consumer products
    • B2B vs B2C priorities
    • Different risk tolerances
  3. Measurement templates

    • Example dashboards for 5-dimension framework
    • Sample metrics and targets
    • How to connect engineering metrics to business KPIs
  4. Case studies

    • What worked, what didn’t
    • Actual data (not just anecdotes)
    • Lessons learned

Who’s interested in collaborating on this?

Michelle’s framework is the foundation. Let’s build the playbook together.

Michelle’s framework + Keisha’s phase model = this is becoming a real implementation guide.

Let me share what we’re learning as one of the few orgs in Phase 2-3 transition.

Our Journey Through the Phases

Phase 0 → 1 (Tool adoption + baseline measurement): 2 months
Phase 1 → 2 (Implementing governance): 4 months (we’re here now)
Phase 2 → 3 (Process optimization): In progress (2 months in, expecting 4-6 more months)

Total time from AI adoption to mature implementation: ~12 months minimum

Most organizations expect results in 3 months. That’s unrealistic.

What We Got Wrong Initially

Mistake 1: Started measuring too late

We rolled out AI tools, then 3 months later tried to establish baselines. This was backwards.

Should have been:

  1. Measure current state for 1 month
  2. Introduce AI tools
  3. Compare to baseline

We can’t prove our productivity claims because we don’t have clean before/after data.

Mistake 2: Underestimated governance implementation time

Setting up automated gates: 2 weeks
Training developers on new practices: 3 months (ongoing)
Building cultural acceptance: 6+ months (still working on it)

We thought governance was a technical problem. It’s mostly a people problem.

Mistake 3: Didn’t budget for process changes

AI tool costs: $50K/year
Process improvement costs to capitalize on AI: $380K/year (CI/CD scaling, training, process redesign)

The tools are cheap. Making them productive is expensive.

Dimension 2 (Process Impact): What’s Working

Michelle’s cycle time focus is right. Here’s our detailed breakdown:

Before AI (baseline):

  • Requirements → Code complete: 5.2 days
  • Code complete → Review approved: 2.8 days
  • Review → Tests passing: 1.6 days
  • Tests → Production: 3.1 days
  • Total cycle time: 12.7 days

After AI (no process changes):

  • Requirements → Code complete: 3.1 days (40% faster :white_check_mark:)
  • Code complete → Review approved: 4.2 days (50% slower :cross_mark:)
  • Review → Tests passing: 2.3 days (44% slower :cross_mark:)
  • Tests → Production: 3.1 days (unchanged)
  • Total cycle time: 12.7 days (no improvement)

After AI + governance + process optimization (current state):

  • Requirements → Code complete: 3.3 days (AI + quality expectations)
  • Code complete → Review approved: 2.6 days (AI-assisted review, better checklists)
  • Review → Tests passing: 1.8 days (improved test infrastructure)
  • Tests → Production: 2.1 days (deployment automation)
  • Total cycle time: 9.8 days (23% improvement)

Key insight: Process optimization was more important than AI for overall cycle time.

Dimension 3 (Quality/Sustainability): The Data

Keisha’s concerns about junior developers are REAL. We’re tracking this:

Developer skill growth (measured via technical assessments):

Junior developers (0-2 years):

  • Pre-AI: 35% skill growth year-over-year
  • Post-AI: 18% skill growth year-over-year

They’re not learning fundamentals because AI does it for them.

Mid-level developers (3-5 years):

  • Pre-AI: 22% skill growth year-over-year
  • Post-AI: 28% skill growth year-over-year

They’re using AI to explore new areas and grow faster.

Senior developers (6+ years):

  • Pre-AI: 12% skill growth year-over-year
  • Post-AI: 8% skill growth year-over-year

They’re spending more time reviewing AI code, less time learning new things.

AI is creating a divergence: Mid-level developers benefit most, juniors and seniors benefit least.

This is a long-term sustainability risk we’re actively addressing.

Dimension 5 (Cost Efficiency): The ROI Calculation

Michelle asked about cost efficiency. Here’s our actual math:

AI Tool Costs:

  • GitHub Copilot: $19/user/month × 42 engineers = $9,576/year
  • ChatGPT Plus: $20/user/month × 42 engineers = $10,080/year
  • Total tool costs: ~$20K/year

Process Improvement Costs:

  • CI/CD infrastructure scaling: $85K/year
  • Security scanning tools: $40K/year
  • Training and workshops: $25K/year
  • Process consulting and design: $60K/year
  • Total process costs: $210K/year

Total AI Initiative Investment: $230K/year

Productivity Gains (conservative estimate):

  • 23% cycle time reduction = ~2.9 days saved per feature
  • Average feature value: $50K (revenue impact)
  • Shipping ~15% more features per year = 6 additional features
  • Value created: ~$300K/year

ROI: 30% positive ($300K value - $230K cost = $70K net benefit)

But it took 9 months to get here. Early months were ROI negative.

The Risk-Based Governance Model

Michelle’s five-dimensional framework needs to be applied differently for different types of work.

Low-risk work:

  • Dimension 1-2 focus (adoption, process)
  • Light governance
  • Fast feedback loops

High-risk work:

  • Dimension 3-4 focus (quality, outcomes)
  • Heavy governance
  • Slower, more careful

Our green/yellow/red zones (refined):

Green Zone (AI encouraged, minimal governance):

  • Test generation and test data creation
  • Documentation and code comments
  • Boilerplate and scaffolding
  • Internal tooling and scripts
  • Refactoring with >80% test coverage

Yellow Zone (AI allowed, standard governance):

  • New features using established patterns
  • UI components (with design review)
  • API endpoints (with security review)
  • Database queries (with SQL review)

Red Zone (AI prohibited or requires special approval):

  • Authentication and authorization
  • Payment processing
  • Cryptographic implementations
  • PII/PHI data handling
  • Novel architectural decisions
  • Database schema migrations

Each zone has different metrics, different review processes, different quality gates.

The Pragmatic Implementation Steps

For orgs starting this journey, here’s what I’d recommend:

Month 1-2:

  • Establish baseline metrics (cycle time, quality, satisfaction)
  • Don’t change anything yet, just measure
  • Build executive alignment on long-term investment

Month 3:

  • Roll out AI tools to pilot team (10-20% of engineers)
  • Implement basic automated gates (security, accessibility)
  • Start tracking 5-dimensional metrics

Month 4-6:

  • Expand to broader team based on pilot learnings
  • Implement AI-specific review checklists
  • Begin addressing bottlenecks (review, testing, deployment)
  • Training on AI-native workflows

Month 7-9:

  • Full rollout to all engineers
  • Continuous process optimization
  • Cultural shift initiatives (celebrate outcomes, not activity)
  • Refine metrics based on what’s predictive

Month 10-12:

  • Measure ROI and business impact
  • Iterate on governance and processes
  • Share learnings and case studies

Expect 12 months to see sustainable productivity gains.

The Community Playbook Contribution

I’ll contribute to Michelle and Keisha’s playbook idea:

What I can share:

  • Detailed metrics and dashboards (5-dimensional framework)
  • Governance templates (checklists, review guides, approval workflows)
  • Training materials (how to use AI well, how to review AI code)
  • ROI calculation templates
  • Risk-based AI usage policy

What I need from others:

  • How to measure Dimension 4 (business outcomes) better
  • How to address junior developer skill development with AI
  • How companies in different industries adapt this framework

Let’s build this together. The AI productivity conversation needs to move beyond hype to real implementation guidance.

This whole discussion has been eye-opening. As someone working at the intersection of design and engineering, I want to add the design perspective to this framework.

Dimension 3 Needs a “Craft” Sub-Metric

Michelle’s Dimension 3 (Quality & Sustainability) mentions code quality, but I think we need to explicitly include craft quality—the things that make software feel polished, thoughtful, and user-centered.

What AI gets wrong about craft:

1. Consistency

  • AI generates solutions that work in isolation
  • Doesn’t understand system-wide patterns
  • Creates one-off solutions instead of reusable patterns

2. User experience nuance

  • Implements functional requirements
  • Misses usability details (loading states, error messages, edge case UX)
  • No understanding of user mental models

3. Aesthetic judgment

  • “Works” doesn’t mean “feels good to use”
  • No sense of visual hierarchy, rhythm, breathing room
  • Accessibility as checkbox, not as design principle

These matter to customers but don’t show up in traditional engineering metrics.

Design Quality Metrics for the Framework

To track craft quality in Dimension 3, we measure:

Design System Health:

  • % of components using design tokens (vs hard-coded values)
  • Design system compliance rate
  • Component reuse rate (new components vs reusing existing)

UX Quality:

  • User satisfaction scores by feature (CSAT, task success rate)
  • Usability testing pass rates
  • Accessibility audit scores (WCAG compliance)

Design Debt:

  • Inconsistencies flagged in design QA
  • Time spent retrofitting UX to AI-generated features
  • Visual/UX bugs vs functional bugs ratio

These complement code quality metrics and tell a fuller picture of Dimension 3.

The Collaboration Impact

Keisha mentioned that team collaboration is changing with AI. I’m seeing this acutely in design-engineering collaboration.

Pre-AI workflow:

  1. Designer creates mockups in Figma
  2. Designer and engineer discuss implementation
  3. Engineer builds, designer reviews
  4. Iteration until it matches design intent

Post-AI workflow:

  1. Designer creates mockups in Figma
  2. Engineer uses AI to generate implementation quickly
  3. Engineer ships without design review (they’re “productive”!)
  4. Designer finds out after it’s in production
  5. Design QA finds 6 violations and 3 accessibility failures
  6. Retrofit work begins

AI made engineering faster but broke the collaboration loop.

What’s Working: Design-AI Integration

To fix this, we’re trying:

1. Design system context in AI prompts

Instead of: “Create a modal dialog”

We teach developers to prompt: “Create a modal dialog using our Acme Design System. Reference design tokens from tokens.json, use ModalBase component, follow WCAG 2.1 AA standards for keyboard navigation and focus management”

Better prompts → better AI output → less cleanup

2. Automated design QA gates

Before merge:

  • Design token usage validation (ESLint rules)
  • Accessibility linting (axe-core)
  • Visual regression tests (Percy)
  • Responsive breakpoint tests

Catches 60% of AI design violations automatically.

3. “Design approved” required for customer-facing changes

Yellow/red zones from Luis’s framework should include:

  • Customer-facing UI changes require design review
  • New patterns (not reusing components) require design approval
  • Accessibility-critical features require accessibility specialist review

Ensures collaboration happens before ship, not after.

The Craft Productivity Paradox

Here’s what worries me about AI productivity:

Coding faster doesn’t mean designing better products.

If we optimize for “features shipped per quarter” (Dimension 4), we might be incentivizing:

  • More features, not better features
  • Functional completeness, not delightful experiences
  • Fast solutions, not thoughtful solutions

The risk: AI helps us ship mediocre experiences faster.

What Dimension 4 Should Include

Michelle’s Dimension 4 (Business Outcomes) has good metrics, but I’d add:

User experience outcomes:

  • Feature adoption rates (do people actually use what we ship?)
  • Task completion rates (can people accomplish their goals?)
  • User satisfaction by feature (not just overall NPS)
  • Accessibility compliance rates
  • Design debt vs design investment ratio

Innovation outcomes:

  • Novel solutions developed (not just reusing patterns)
  • Exploratory work and experimentation
  • Design excellence awards, patents, unique approaches

If we only measure “features delivered,” we’ll get quantity over quality.

The Vision: AI as Design Tool, Not Replacement

Michelle’s point about “AI enabling higher-value work” applies to design too.

What I want:

  • AI handles implementation details → designers focus on user research
  • AI generates variations → designers focus on strategic choices
  • AI builds components → designers focus on system thinking
  • AI does execution → designers do discovery

Currently, AI is mostly used to bypass design, not to augment it.

My Contribution to the Playbook

For the community playbook Keisha and Michelle proposed:

Design-specific guidance:

  • How to integrate design system requirements into AI workflows
  • Design QA checklists for AI-generated UI
  • Accessibility review process for AI code
  • Collaboration patterns that work with AI velocity

Templates and examples:

  • Effective AI prompts for design system compliance
  • Automated design quality gates
  • Design debt tracking dashboard
  • UX outcome metrics

I’m in for collaborating on this.

The framework is great, but it needs to explicitly include craft quality and user experience—not just functional correctness and business metrics.

Otherwise we’ll optimize for shipping more mediocre experiences, not shipping better ones.

This has evolved into something really valuable. Let me add the product/business perspective to close the loop.

Dimension 4 (Business Outcomes) Is Where Executives Care

Michelle’s framework is comprehensive, but here’s the reality: Executives only care about Dimension 4.

They don’t care about:

  • What % of developers use AI (Dimension 1)
  • Whether cycle time improved by 23% (Dimension 2)
  • Code quality scores (Dimension 3)
  • Tool costs and ROI calculations (Dimension 5)

They care about:

  • Are we shipping features customers want faster?
  • Is revenue growing?
  • Is customer satisfaction improving?
  • Are we beating competitors to market?

Dimension 4 is the only one that matters to the business.

But as we’ve discussed, Dimension 4 is hardest to measure and hardest to attribute to AI specifically.

The Attribution Problem (Revisited)

Luis shared impressive data: 23% cycle time improvement, positive ROI.

But can you prove that translated to business outcomes?

Here’s the attribution challenge at our company:

Q3 Results:

  • Revenue: +12% vs Q2
  • Customer activation: +8%
  • Feature adoption: +15%
  • Engineering shipped 18% more features

Was this because of:

  • AI productivity gains in engineering?
  • New marketing campaigns?
  • Seasonal business patterns?
  • Product-market fit improvements?
  • Sales team expansion?
  • Macro economic conditions?

We can’t isolate the AI variable.

The Proxy Metric Approach

Since direct attribution is impossible, I’m proposing leading indicators that correlate with business success:

Product velocity (leading indicator):

  • Features delivered per quarter
  • Time from idea to production
  • % of roadmap completed

Customer feedback velocity (leading indicator):

  • Time from customer request to delivered solution
  • % of customer requests addressed
  • Feature request → production cycle time

Quality of execution (leading indicator):

  • Feature adoption rates (do customers use what we ship?)
  • Customer satisfaction by feature
  • Post-launch bug rates

Learning velocity (leading indicator):

  • Experiment velocity (how fast can we test hypotheses?)
  • Iteration speed (how fast can we improve based on feedback?)

These aren’t perfect, but they’re measurable and they predict business outcomes better than activity metrics.

The Balanced Scorecard in Practice

Here’s how I’m thinking about this for our next board presentation:

AI Productivity Scorecard:

Dimension Metric Target Actual Status
Adoption Active AI users >80% 87% :white_check_mark:
Process Cycle time reduction >15% 23% :white_check_mark:
Quality Defect rate <5% 7% :warning:
Outcomes Features delivered +20% +18% :white_check_mark:
Outcomes Customer satisfaction +5pts +2pts :warning:
Outcomes Revenue impact +10% +12% :white_check_mark:
Cost ROI >25% 30% :white_check_mark:

Overall Assessment: Positive with quality concerns to address

This tells a much more nuanced story than “AI made us 45% more productive!”

The Product Strategy Implications

Maya’s point about craft quality is critical from a product perspective.

The risk I’m worried about:

If we optimize purely for “features shipped per quarter,” we might:

  • Ship features customers don’t want (quantity over quality)
  • Ship features that don’t move business KPIs
  • Ship poorly-designed features that hurt brand perception
  • Create feature bloat that makes product harder to use

Productivity without product strategy is just busy work.

What Product Needs in the Framework

Michelle’s framework is engineering-centric (which makes sense given the audience). But to get executive buy-in, we need to connect it to product/business strategy:

Dimension 4 should include:

Business metrics:

  • Revenue and growth rate
  • Customer acquisition cost (CAC)
  • Customer lifetime value (LTV)
  • Market share and competitive positioning

Product metrics:

  • Feature adoption and usage rates
  • User engagement and retention
  • Customer satisfaction and NPS
  • Product-market fit indicators

Strategic metrics:

  • Time-to-market vs competitors
  • Innovation rate (new capabilities, not just feature count)
  • Market responsiveness (speed of adapting to customer needs)

ROI metrics:

  • Engineering cost per feature
  • Cost per customer acquisition (engineering contribution)
  • Revenue per engineer
  • Margin improvement from efficiency gains

These connect engineering productivity to business strategy.

The 12-Month Timeline Reality

Luis mentioned expecting 12 months to see sustainable gains. This is the message executives need to hear.

Current narrative (wrong):
“We’ll buy AI tools and productivity will increase 45% immediately!”

Reality narrative (correct):
“We’ll invest in AI tools + process changes over 12 months. We expect modest gains in months 1-6, accelerating gains in months 7-12, and sustainable productivity improvements by month 12+. ROI will be positive but not immediate.”

Setting realistic expectations is critical for getting sustained investment.

The Community Playbook: Business Case Section

For the playbook, I can contribute:

Executive communication:

  • How to pitch AI productivity investment (realistic expectations)
  • Board presentation templates (5-dimensional scorecard)
  • Business case templates (cost, timeline, ROI projections)

Product-engineering alignment:

  • How to connect engineering metrics to product metrics to business metrics
  • Prioritization frameworks that balance velocity with strategic value
  • Feature impact measurement templates

Success metrics:

  • What “good” looks like at different stages (months 1-3, 4-6, 7-9, 10-12)
  • Red flags that indicate AI isn’t working
  • When to double down vs pivot

The Final Synthesis

Michelle asked: “Has anyone achieved high velocity AND high quality with AI?”

My answer after this discussion:

Not yet, because most organizations are optimizing for the wrong thing.

They’re optimizing for:

  • Maximum individual developer velocity
  • Maximum feature output
  • Minimum tool costs

They should be optimizing for:

  • Sustainable organizational productivity
  • Maximum customer value per engineering investment
  • Balanced improvement across all 5 dimensions

The orgs that figure out the latter will have a massive competitive advantage.

The 5-dimensional framework + phase-based implementation + community playbook could be the guide that gets us there.

I’m absolutely in for contributing. Let’s build this.