Prompt Edits Without PRs: The Velocity Metric Your AI Team Is Failing
A head of engineering opens the velocity dashboard on a Monday morning. PRs merged per week, flat. Story points completed, flat. Lines changed, suspiciously low. The AI team is having a quiet quarter, the chart says. Two floors away, that team has rewritten the system prompt seven times in three weeks, swapped a tool description that doubled tool-call accuracy, added six new few-shot examples, and tuned the rerank instruction until the product feels like a different application. None of that work shows up in the PR graph. None of it is invisible to users.
The asymmetry between what AI teams change and what engineering dashboards measure has become the load-bearing misdiagnosis of 2026. Behavior change in an AI-heavy product is increasingly decoupled from code change, and the metrics that have governed software organizations for fifteen years — PR throughput, commit volume, lines touched — measure code change. A team can be reshaping production response distributions weekly and look idle on every chart leadership trusts.
