Here’s the paradox of developer productivity measurement in 2026: companies are investing $600K-$780K annually in developer productivity platforms — LinearB, Jellyfish, Cortex, DX, Sleuth, and a growing ecosystem of tools promising to quantify engineering output — yet two-thirds of developers don’t believe the metrics these platforms produce actually reflect their work. Meanwhile, 30% of platform engineering teams, the teams directly responsible for developer experience, measure nothing at all. The measurement crisis runs both ways: bad metrics and no metrics.
I’ve been managing engineering teams for 18 years, and I’ve never seen a wider gap between what leadership wants to measure and what engineers believe is measurable. Let me break down why.
Why Developers Mistrust Productivity Metrics
1. Output metrics miss the point. Lines of code, PRs merged, commits per day, story points completed — these metrics reward quantity over quality. A developer who spends a week carefully designing an elegant 50-line solution that handles every edge case and is maintainable for years scores lower than someone who cranks out 500 lines of spaghetti code that will generate bugs for the next six months. We all know this intuitively, yet these remain the most commonly tracked metrics because they’re easy to collect.
2. Metric gaming is inevitable. When metrics are tied to performance reviews — and increasingly, they are — developers optimize for the metric, not the outcome. PR count targets lead to smaller, less meaningful pull requests. A developer splits a coherent feature into 8 tiny PRs instead of 2 well-structured ones because their review mentions “PRs per sprint.” Cycle time targets lead to skipping thorough review — why spend an extra day on a careful code review when it hurts your numbers? Goodhart’s Law isn’t theoretical in engineering management — it’s Tuesday.
3. Context is invisible to dashboards. Metrics don’t capture that debugging a critical production issue, mentoring a junior engineer through their first architecture decision, or writing a design document that saves three months of rework are enormously valuable work. The developer who saved a $100K outage by spending 3 days on meticulous root cause analysis shows zero productivity on commit-based metrics during that period. In fact, they look less productive than their peers who were shipping features while the production system burned.
4. Surveillance anxiety is real. Many developers perceive productivity metrics as surveillance, not support. And honestly? Sometimes they’re right. DX Research found that teams where metrics feel punitive have 25% lower satisfaction than teams where metrics feel supportive. The framing isn’t just a communication issue — it reflects a genuine philosophical difference in how leadership views engineering work. Are engineers trusted professionals whose output is complex and contextual, or are they resources whose utilization should be maximized?
Why 30% of Platform Teams Measure Nothing
On the other end of the spectrum, nearly a third of platform engineering teams — the teams who should be most invested in understanding developer experience — measure nothing at all. The reason is straightforward: they can’t agree on what to measure.
The industry offers a bewildering array of framework options. DORA gives you 4 metrics (deployment frequency, lead time, change failure rate, mean time to recovery). SPACE offers 5 dimensions (satisfaction, performance, activity, communication, efficiency). DX Core 4 proposes yet another framework. DevEx from Abi Noda’s research focuses on 3 dimensions (flow state, feedback loops, cognitive load). And every vendor has their own proprietary framework that, conveniently, their platform measures best.
Each framework has legitimate trade-offs. DORA is deployment-focused and misses the experience of writing code. SPACE is comprehensive but complex to implement. DevEx captures sentiment but is harder to quantify. The debate over which framework to adopt often paralyzes teams indefinitely. I’ve watched platform teams spend 6 months debating measurement frameworks while measuring nothing — the analysis paralysis is real.
The result: organizations invest $600K+ in tooling but never instrument it properly, or instrument it but never act on the data, or act on the data but can’t explain to engineers what it means.
My Approach: Metric Triads
After years of failed single-metric experiments, I abandoned dashboards built around individual metrics. Instead, I use what I call “metric triads” — for each engineering goal, I track one outcome metric, one activity metric, and one perception metric.
Example: Deployment Reliability
- Outcome metric: Change failure rate (what percentage of deployments cause incidents?)
- Activity metric: Deployment frequency (how often are we deploying?)
- Perception metric: Developer confidence in deploying on Fridays (do engineers feel safe deploying?)
The triad prevents gaming because optimizing one metric at the expense of the others becomes immediately visible. If deployment frequency goes up but change failure rate also rises, you’re deploying faster but less safely. If change failure rate looks great but Friday deployment confidence is low, there’s a disconnect between the numbers and the lived experience — maybe the metrics aren’t capturing near-misses or stressful deployments that didn’t technically fail.
Perception metrics — collected through brief, anonymous, periodic surveys — catch issues that quantitative metrics systematically miss. Engineers can tell you when the deployment process feels fragile long before it shows up in failure rates. They know when the CI pipeline is flaky even when the dashboard says uptime is 99.5%.
The $600K Question
Is the investment in productivity platforms worth it? My honest answer: it depends entirely on how you use it.
The platforms provide useful data for organizational decision-making — capacity planning, bottleneck identification, trend analysis across quarters. When used at the team level for identifying systemic issues, they’re genuinely valuable. When I can see that Team A’s cycle time spiked 40% last quarter and investigate whether it’s a tooling issue, a team composition change, or a particularly complex project, that’s actionable insight.
But these platforms are poor tools for individual performance assessment. When used to evaluate individual developers — comparing engineer A’s commit frequency against engineer B’s, or ranking developers by PR throughput — they’re actively destructive. They reward visible output over invisible impact, penalize the engineers who do the hardest work (debugging, architecture, mentoring), and create a culture where appearing productive matters more than being productive.
The framing matters more than the tooling. The same $600K platform can be a force multiplier for engineering excellence or a morale-destroying surveillance system. The difference isn’t the technology — it’s the leadership decisions about what to measure, how to communicate it, and what actions to take based on the data.
So I’m genuinely curious: what developer productivity metrics does your org track, and — critically — do your developers trust them? In my experience, the gap between what leadership measures and what engineers believe is meaningful is one of the biggest unaddressed tensions in modern engineering organizations.