I’ve sat in hundreds of performance calibration sessions across Google, Slack, and now my current company. And I need to say something that most engineering leaders won’t say out loud: the system is fundamentally broken, and it’s costing us our best people.
What Calibration Is Supposed to Do
For those unfamiliar, performance calibration is the process where engineering managers meet — usually quarterly or biannually — to “calibrate” performance ratings across teams. The intent is noble: prevent rating inflation, ensure consistency across managers with different standards, and create fairness through peer review of individual ratings. It’s standard practice at most tech companies with 100+ engineers, and on paper, it sounds like a reasonable check on managerial bias.
The idea is that if Manager A rates everyone “exceeds expectations” and Manager B is a tough grader, calibration forces them both to a shared standard. A committee reviews the ratings, discusses the evidence, and adjusts where needed. Fair, right?
The Reality I’ve Observed
Here’s what actually happens in most calibration rooms: promotions are determined by your manager’s advocacy skills, not your engineering skills.
The engineer whose manager argues persuasively, knows the right people in the room, and can tell a compelling narrative? That engineer gets promoted. The engineer whose manager is quiet, non-confrontational, or simply new to the calibration process? That engineer stays at “meets expectations” regardless of their actual output.
I’ve watched this play out hundreds of times. A manager walks in with a carefully prepared pitch, name-drops the right executives, ties their report’s work to a high-visibility initiative, and walks out with the rating they wanted. Meanwhile, the manager at the next table mumbles through their case for an engineer who rebuilt the entire CI/CD pipeline — work that saved the company millions but doesn’t make for a flashy story.
The Biases That Calibration Was Supposed to Fix (But Actually Amplifies)
Recency bias. The last 8 weeks of a 26-week cycle disproportionately shape the discussion. An engineer who delivered a massive project in month 1 but had a quiet month 6 gets rated lower than someone who shipped a small feature last week.
Visibility bias. Engineers who present at all-hands meetings, write company-wide emails, and attend leadership offsites get noticed. Engineers who do deep, critical infrastructure work in silence don’t. Calibration rewards the visible, not the valuable.
Similarity bias. Research consistently shows that managers advocate harder for people who remind them of themselves — same background, same communication style, same approach to work. In a room full of extroverted managers, introverts lose.
The “too expensive” excuse. I’ve literally heard calibrators say “promoting them would be expensive because they’re in San Francisco.” An engineer’s location should never determine their rating, but in practice, the cost implications of a promotion quietly influence the conversation.
The Quiet Performer Problem
This is the one that keeps me up at night. Introverted engineers who do excellent deep work — the ones who fix the gnarliest bugs, mentor junior engineers behind the scenes, and maintain systems that everyone depends on — are systematically disadvantaged in calibration. Their managers often lack concrete, quotable examples because the work doesn’t generate visible artifacts. There’s no launch announcement, no demo, no all-hands shout-out. Just reliable, essential engineering that nobody notices until it breaks.
The Stack Ranking Ghost
Most companies will tell you they abolished stack ranking years ago. But calibration enforces a de facto distribution curve. If the guidance says 20% of engineers can be rated “exceeds expectations,” and your organization has 50 engineers, that’s 10 slots. Someone has to be moved down to make room. The conversation shifts from “does this person deserve it?” to “who do we bump?”
That’s stack ranking with extra steps.
What I Think We Should Do Instead
- Written evidence packages. Every rating should come with a structured document — not a verbal pitch. Reduce the reliance on real-time advocacy and let the work speak for itself.
- Anonymous calibration. Remove manager names from the initial rating review. The committee evaluates the evidence without knowing who’s advocating. This reduces the “popular manager” effect.
- Structured rubrics weighted toward outcomes. Define what “exceeds expectations” means in measurable terms: impact delivered, systems improved, people developed. Not “they seemed impressive in the meeting.”
The DEI Dimension We Can’t Ignore
Research from McKinsey and MIT shows that calibration disproportionately disadvantages women and underrepresented minorities, who are less likely to have vocal advocates in the room and more likely to have their contributions attributed to the team rather than the individual. When I look at who gets stuck at “meets expectations” cycle after cycle, the pattern is clear and it’s not about performance.
How does your company handle performance calibration? And — honestly — is it actually fair? I’d love to hear from both managers and ICs about what you’ve seen.