Skip to main content

One post tagged with "performance-reviews"

View all tags

The Perf Review Template That Cannot See AI Work

· 11 min read
Tian Pan
Software Engineer

Your strongest AI engineer spent the cycle curating an eval set, calibrating a judge prompt, and killing two features that turned out to be task-shape mismatched. None of that fits a single line on the review template. So the calibration meeting either inflates the artifacts the engineer cares least about — PR count, design docs, on-call shifts — or invents prose to justify a high rating the framework cannot defend. Either way, the rubric and the reality are pulling in different directions, and the engineer can tell.

The template was written for deterministic software. It rewards what you can count: lines of code shipped, services owned, incidents resolved, hours spent on-call. The AI roadmap is moved by a different shape of work: curating a representative eval slice, defending a behavioral envelope under model drift, refusing to ship a feature whose task shape doesn't fit the model, and patiently shrinking the gap between a judge prompt and human intent. Almost none of that produces the artifacts the rubric was built to count.