Performance Calibration Is Where Promotions Go to Die: Big Tech's "Stack Rank by Committee" Creates Fairness Theater

I’ve sat in hundreds of performance calibration sessions across Google, Slack, and now my current company. And I need to say something that most engineering leaders won’t say out loud: the system is fundamentally broken, and it’s costing us our best people.

What Calibration Is Supposed to Do

For those unfamiliar, performance calibration is the process where engineering managers meet — usually quarterly or biannually — to “calibrate” performance ratings across teams. The intent is noble: prevent rating inflation, ensure consistency across managers with different standards, and create fairness through peer review of individual ratings. It’s standard practice at most tech companies with 100+ engineers, and on paper, it sounds like a reasonable check on managerial bias.

The idea is that if Manager A rates everyone “exceeds expectations” and Manager B is a tough grader, calibration forces them both to a shared standard. A committee reviews the ratings, discusses the evidence, and adjusts where needed. Fair, right?

The Reality I’ve Observed

Here’s what actually happens in most calibration rooms: promotions are determined by your manager’s advocacy skills, not your engineering skills.

The engineer whose manager argues persuasively, knows the right people in the room, and can tell a compelling narrative? That engineer gets promoted. The engineer whose manager is quiet, non-confrontational, or simply new to the calibration process? That engineer stays at “meets expectations” regardless of their actual output.

I’ve watched this play out hundreds of times. A manager walks in with a carefully prepared pitch, name-drops the right executives, ties their report’s work to a high-visibility initiative, and walks out with the rating they wanted. Meanwhile, the manager at the next table mumbles through their case for an engineer who rebuilt the entire CI/CD pipeline — work that saved the company millions but doesn’t make for a flashy story.

The Biases That Calibration Was Supposed to Fix (But Actually Amplifies)

Recency bias. The last 8 weeks of a 26-week cycle disproportionately shape the discussion. An engineer who delivered a massive project in month 1 but had a quiet month 6 gets rated lower than someone who shipped a small feature last week.

Visibility bias. Engineers who present at all-hands meetings, write company-wide emails, and attend leadership offsites get noticed. Engineers who do deep, critical infrastructure work in silence don’t. Calibration rewards the visible, not the valuable.

Similarity bias. Research consistently shows that managers advocate harder for people who remind them of themselves — same background, same communication style, same approach to work. In a room full of extroverted managers, introverts lose.

The “too expensive” excuse. I’ve literally heard calibrators say “promoting them would be expensive because they’re in San Francisco.” An engineer’s location should never determine their rating, but in practice, the cost implications of a promotion quietly influence the conversation.

The Quiet Performer Problem

This is the one that keeps me up at night. Introverted engineers who do excellent deep work — the ones who fix the gnarliest bugs, mentor junior engineers behind the scenes, and maintain systems that everyone depends on — are systematically disadvantaged in calibration. Their managers often lack concrete, quotable examples because the work doesn’t generate visible artifacts. There’s no launch announcement, no demo, no all-hands shout-out. Just reliable, essential engineering that nobody notices until it breaks.

The Stack Ranking Ghost

Most companies will tell you they abolished stack ranking years ago. But calibration enforces a de facto distribution curve. If the guidance says 20% of engineers can be rated “exceeds expectations,” and your organization has 50 engineers, that’s 10 slots. Someone has to be moved down to make room. The conversation shifts from “does this person deserve it?” to “who do we bump?”

That’s stack ranking with extra steps.

What I Think We Should Do Instead

  1. Written evidence packages. Every rating should come with a structured document — not a verbal pitch. Reduce the reliance on real-time advocacy and let the work speak for itself.
  2. Anonymous calibration. Remove manager names from the initial rating review. The committee evaluates the evidence without knowing who’s advocating. This reduces the “popular manager” effect.
  3. Structured rubrics weighted toward outcomes. Define what “exceeds expectations” means in measurable terms: impact delivered, systems improved, people developed. Not “they seemed impressive in the meeting.”

The DEI Dimension We Can’t Ignore

Research from McKinsey and MIT shows that calibration disproportionately disadvantages women and underrepresented minorities, who are less likely to have vocal advocates in the room and more likely to have their contributions attributed to the team rather than the individual. When I look at who gets stuck at “meets expectations” cycle after cycle, the pattern is clear and it’s not about performance.

How does your company handle performance calibration? And — honestly — is it actually fair? I’d love to hear from both managers and ICs about what you’ve seen.

@vp_eng_keisha, this post articulates what I’ve been struggling to put into words for years. I participate in calibration both as an advocate for my reports and as a calibrator reviewing other teams’ ratings. I see both sides, and honestly? Neither side feels great.

The System IS Biased — But What’s the Alternative?

You’re absolutely right that calibration rewards advocacy over performance. I’ve seen it firsthand. But here’s the uncomfortable truth I keep coming back to: without calibration, the system is worse. Every single manager — myself included — suffers from the Lake Wobegon effect. We all think our teams are above average. In my first year as a director, before I understood calibration deeply, I rated 80% of my team as “exceeds expectations.” Was I right? No. I was a new manager who wanted to reward effort, and I conflated effort with impact.

Without calibration, you get wildly inconsistent ratings. Team A has an easy grader and everyone gets promoted. Team B has a tough grader and nobody does. Engineers figure this out fast, and the ones who can transfer go to Team A. That’s not fairness either.

My Middle Ground: Structured Written Calibration

I’ve been piloting something different with my organization: structured calibration with a required evidence document for every rating. Here’s how it works:

  • Every manager writes a 1-2 page evidence brief for each engineer being reviewed. The brief follows a strict template: business impact (quantified), technical complexity, scope of influence, and growth trajectory.
  • No oral arguments in the first round. The committee reads the briefs independently and assigns preliminary ratings without discussion.
  • We only discuss cases where there’s disagreement between the written brief and the committee’s assessment.

It’s slower — calibration takes about 40% longer. But the results are meaningfully different. The “best speaker wins” problem drops significantly because the evidence has to stand on its own. I’ve also noticed that quieter managers, who are often the ones advocating for quiet performers, do much better in this format because they tend to be thorough writers.

The Piece That’s Still Missing

Where I don’t have a good answer is the distribution curve issue you raised. Even with better evidence, we still have a fixed budget for promotions and top ratings. Until companies are willing to decouple performance evaluation from compensation decisions — rate everyone on an absolute scale, then figure out the budget separately — calibration will always have a zero-sum element. And zero-sum creates the politics you’re describing.

For now, I’d advocate for written evidence as the single highest-leverage change most organizations can make tomorrow. It doesn’t solve everything, but it shifts the conversation from “who has the best manager” to “who has the best evidence.”

I want to share a story that illustrates exactly what @vp_eng_keisha is describing, because I lived it.

Three Cycles of “Meets Expectations” Despite $500K in Savings

At my previous company, I spent 18 months redesigning our deployment infrastructure. I migrated us from a fragile, manually-orchestrated system to a fully automated pipeline that reduced deployment time from 4 hours to 12 minutes and eliminated a class of production incidents that had been costing us roughly $500K per year in engineering time and lost revenue. I documented everything, wrote the RFCs, got buy-in from stakeholders, and delivered on time.

For three consecutive performance cycles, I was rated “meets expectations.”

I didn’t understand it. My skip-level told me the work was exceptional. Peer feedback was glowing. The infrastructure team literally nominated me for an internal award. But every calibration, my rating came back the same.

What I Learned — Off the Record

It wasn’t until a peer’s manager told me, confidentially, what was happening in calibration. My manager at the time was a genuinely good person — technically strong, cared about the team — but he was terrible at calibration politics. He’d walk in with a one-paragraph summary of my work, get steamrolled by managers who came with polished narratives and executive-visible metrics, and walk out without fighting for the rating I deserved. He didn’t know how to play the game.

Meanwhile, an engineer on another team who had shipped a customer-facing feature with modest impact but high visibility — a feature the VP of Product had mentioned in an all-hands — got “exceeds expectations” three cycles in a row. Their manager was a seasoned calibration veteran who knew exactly how to frame the narrative.

The Transfer Experiment

I transferred to a team with a manager known for being a strong advocate. Within 6 months, doing the same quality of work, I was promoted. Nothing about my engineering changed. My output didn’t improve. The only variable was my manager’s ability to navigate calibration.

That experience permanently changed how I think about career growth in big tech. Your manager is your most important career decision — not because of what they teach you, but because of how they represent you in rooms you’re not in. And that’s a deeply broken incentive structure.

I’m now extremely deliberate about evaluating a manager’s calibration track record before joining a team. I ask pointed questions in interviews: “How many of your reports were promoted last cycle? How do you prepare for calibration?” If they can’t answer concretely, that tells me everything I need to know.

@eng_director_luis, the written evidence model sounds promising. At minimum, it would have given my old manager a structured way to present my work without needing to out-argue the room.

@vp_eng_keisha, every word of this resonates. I participated in calibration at both Microsoft and Twilio, and I observed the exact same patterns – advocacy skills trumping actual performance, distribution curves creating artificial scarcity, and underrepresented engineers consistently getting the short end.

We Abolished Traditional Calibration

When I became CTO, one of my first major decisions was to dismantle our calibration process entirely. It was controversial. My VP of HR pushed back hard – “calibration is industry best practice.” My response: “Best practice for whom?”

Here is what we replaced it with:

Peer Review Panels. For every engineer being considered for promotion or a top rating, we assemble a panel of 3 engineers who worked closely with the candidate during the review period. These are not random engineers – they are people who reviewed the candidate’s code, collaborated on projects, or depended on their systems. They evaluate performance using structured rubrics.

The Manager Writes, But Does Not Decide. The manager writes the recommendation and provides context, but the peer panel has the final say on the technical evaluation. The manager’s role shifts from “advocate in a room” to “evidence collector and context provider.” This is a fundamental change – it removes the incentive for managers to be great politicians and replaces it with an incentive to be great documenters.

Anonymous and Diverse Panels. The panel members do not know who else is on the panel, and we ensure demographic diversity. The evaluations are written independently, then compiled by HR. This prevents groupthink and reduces the similarity bias you described.

The Key Innovation: Published, Specific Criteria

The single most impactful change was publishing our evaluation criteria with extreme specificity. We eliminated subjective language like “exceeds expectations” entirely. Instead, our rubrics define concrete thresholds:

  • Impact: “Delivered a project that measurably improved [metric] by [X%] or equivalent business value”
  • Technical complexity: “Solved a problem requiring [specific technical depth] that was novel to the organization”
  • Scope of influence: “Work was adopted by or benefited [N] teams beyond the engineer’s immediate team”
  • Growth: “Mentored [N] engineers, with evidence of their measurable skill development”

When the criteria are published and specific, engineers know exactly what they need to demonstrate. There is no mystery, no backroom negotiation, no reliance on having the right manager.

Results After 2 Years

Promotion rates for women engineers increased 34%. Promotion rates for underrepresented minorities increased 28%. And critically, the engineers who were promoted reported higher confidence that their promotion was earned – not politically maneuvered. Manager satisfaction with the process also went up, because they no longer felt the pressure to be “the best arguer in the room.”

It is more work. Each review cycle takes about 2x the administrative effort of traditional calibration. But the promotions feel legitimate, and we have seen a measurable reduction in attrition among high performers who previously felt the system was rigged against them.

@eng_director_luis, your written evidence approach is a great intermediate step. For organizations not ready to fully restructure, moving from oral advocacy to written evidence is the single highest-ROI change available.