Skip to main content

7 posts tagged with "hiring"

View all tags

The Eval Bottleneck: Your Eval Engineer Is Now the Roadmap

· 11 min read
Tian Pan
Software Engineer

The constraint on your AI roadmap isn't GPU capacity, model availability, or prompt-engineering taste. It's the calendar of one or two engineers who actually know how to build an eval that catches a regression. Every PM with a feature is in their queue. Every model upgrade is in their queue. Every cohort drift, every prompt revision, every "is this judge still calibrated" question lands in the same inbox. And the engineer in question said "no, this isn't ready" three times this quarter, got overruled twice, watched the regression compound in production, and is now updating their LinkedIn.

This is the eval bottleneck, and most orgs don't see it until it bites. Through 2025 the visible scaling story was AI engineers — hire AI engineers, ship AI features, iterate on prompts, swap models. By Q1 2026 the throughput problem moved one layer down. The team that doubled its AI headcount discovered that adding more feature engineers didn't make features ship faster, because every feature still needed an eval, and the eval engineer was the same person.

The AI Engineer Interview Is Broken: Stop Testing Implementation, Start Probing Eval-Design

· 10 min read
Tian Pan
Software Engineer

A team I worked with last quarter rejected three candidates in a row from their AI engineer pipeline. All three failed the coding screen — the kind of problem where you implement a sliding-window deduplicator under a 35-minute timer. The team then hired the candidate who passed it. Four months later that engineer was the one who shipped the feature where the eval scored 92% in CI and the support queue lit up the day after launch. The eval was measuring exact-match against a curated test set. Production users phrased their queries differently. Nobody on the hiring panel had asked the candidate how they would have caught that gap.

That's the shape of the bug. The interview pipeline was screening for the skill that mattered least to the job and was blind to the skill that mattered most. The team did not have a "judgment" round. They had a coding round, a system-design round, and a behavioral round, and they were running the same loop they had run in 2021 — the one calibrated for engineers who were going to write deterministic code against stable libraries.

The AI Interview Has No Signal: Why Your Loop Doesn't Identify People Who Ship LLM Products

· 10 min read
Tian Pan
Software Engineer

A team I know spent six months running their standard senior-engineer loop with an "AI round" bolted on. They interviewed seventy candidates. They hired three. None of the three shipped an agent that survived a production weekend. The team blamed the talent market. The talent market was fine. The loop was the problem.

The standard engineering interview was calibrated for a stack where correctness is verifiable, performance is measurable on a benchmark, and a good engineer is someone who can decompose a problem into deterministic components and reason about edge cases against a known specification. That stack still exists, and those skills still matter, but the cluster of skills that predicts shipping LLM products is largely orthogonal to it. Your loop is asking the right questions about the wrong job.

This is a structural problem, not a calibration nudge. Adding a forty-five-minute "AI round" to a loop calibrated for deterministic systems doesn't surface AI builders — it surfaces the intersection of classical-systems-strong and LLM-fluent candidates, which is a vanishingly small set, and produces six months of failed loops while everyone wonders where all the AI engineers went.

The Three Tastes of an AI Engineer: Why Prompts, Evals, and Guardrails Don't Live in the Same Head

· 11 min read
Tian Pan
Software Engineer

The three best AI engineers I have hired this year would all fail each other's interviews. The one who writes prompts that survive a model upgrade has never written a useful eval case in her life. The one who designs eval sets that catch the failures that matter writes prompts that other engineers refuse to extend. The one who designs guardrails that fail closed without choking the happy path has opinions about the other two that I cannot print here.

The job ladder calls all three of them "AI engineer." The calibration committee compares their promo packets as if they had been doing the same job. They have not.

The AI Interview Collapse: Engineering Hiring Has Lost Its Signal

· 11 min read
Tian Pan
Software Engineer

The signal is gone. In a recent audit of 19,368 technical interviews, 38.5% of candidates were flagged for AI-assisted cheating, with technical roles hitting 48% and junior candidates cheating at nearly double the rate of senior ones. More damning: 61% of detected cheaters scored above the passing threshold. Without the detection layer, they would have advanced. The interview, as an instrument, is no longer measuring what it was designed to measure.

This is not a moral panic about kids these days. It is a mechanical failure of the instrument. The technical interview was calibrated for a world in which a candidate, under time pressure, in an unfamiliar environment, had to produce correct code from memory and first principles. That constraint — the thing that made the signal legible — has been dissolved by a free-tier chat window running on a second device. Every company that still runs a LeetCode-style screen is now paying to sort candidates on a test the test-taker can trivially outsource.

Hiring for LLM Engineering: What the Interview Actually Needs to Test

· 10 min read
Tian Pan
Software Engineer

Most engineering teams that hire for LLM roles run roughly the same interview: two rounds of LeetCode, a system design question, maybe a quiz on transformer internals. They're assessing for the wrong things — and they know it. The candidates who ace those screens often struggle to ship working AI features, while the ones who stumble on binary search can build an eval suite from scratch and debug a hallucinating pipeline in an afternoon.

The skills that predict success in LLM engineering have almost no overlap with what traditional ML or software interviews test. Hiring managers who haven't updated their process are generating false negatives at a high rate — rejecting engineers who would succeed — while false positives walk in with solid LeetCode scores and no intuition for when a model is confidently wrong.

The AI Hiring Rubric Problem: Why Your Interview Loop Selects the Wrong Engineer

· 8 min read
Tian Pan
Software Engineer

Most teams hiring AI engineers today are running an interview process optimized for a job that doesn't exist. They're screening for LeetCode fluency, quizzing candidates on transformer internals, and rewarding anyone who can confidently sketch a distributed system on a whiteboard. Then those same candidates join the team, struggle to debug a hallucinating retrieval pipeline, and ship a model integration that works beautifully in staging and silently degrades in production.

This isn't a talent problem. It's a measurement problem. The skills that predict success in AI engineering are largely invisible to traditional interview loops—and the skills interviews do measure correlate poorly with what the job actually requires.