Skip to main content

2 posts tagged with "interviews"

View all tags

The AI Engineer Interview Is Broken: Stop Testing Implementation, Start Probing Eval-Design

· 10 min read
Tian Pan
Software Engineer

A team I worked with last quarter rejected three candidates in a row from their AI engineer pipeline. All three failed the coding screen — the kind of problem where you implement a sliding-window deduplicator under a 35-minute timer. The team then hired the candidate who passed it. Four months later that engineer was the one who shipped the feature where the eval scored 92% in CI and the support queue lit up the day after launch. The eval was measuring exact-match against a curated test set. Production users phrased their queries differently. Nobody on the hiring panel had asked the candidate how they would have caught that gap.

That's the shape of the bug. The interview pipeline was screening for the skill that mattered least to the job and was blind to the skill that mattered most. The team did not have a "judgment" round. They had a coding round, a system-design round, and a behavioral round, and they were running the same loop they had run in 2021 — the one calibrated for engineers who were going to write deterministic code against stable libraries.

The AI Interview Has No Signal: Why Your Loop Doesn't Identify People Who Ship LLM Products

· 10 min read
Tian Pan
Software Engineer

A team I know spent six months running their standard senior-engineer loop with an "AI round" bolted on. They interviewed seventy candidates. They hired three. None of the three shipped an agent that survived a production weekend. The team blamed the talent market. The talent market was fine. The loop was the problem.

The standard engineering interview was calibrated for a stack where correctness is verifiable, performance is measurable on a benchmark, and a good engineer is someone who can decompose a problem into deterministic components and reason about edge cases against a known specification. That stack still exists, and those skills still matter, but the cluster of skills that predicts shipping LLM products is largely orthogonal to it. Your loop is asking the right questions about the wrong job.

This is a structural problem, not a calibration nudge. Adding a forty-five-minute "AI round" to a loop calibrated for deterministic systems doesn't surface AI builders — it surfaces the intersection of classical-systems-strong and LLM-fluent candidates, which is a vanishingly small set, and produces six months of failed loops while everyone wonders where all the AI engineers went.