Skip to main content

What a Coding Interview Measures When the Candidate Has an Agent

· 9 min read
Tian Pan
Software Engineer

The coding interview was built to isolate a single variable. Put a person in a room, give them a problem, take away their references, and watch whether they can turn the problem into working code by themselves. Everything about the format — the whiteboard, the blank editor, the prohibition on looking things up — exists to strip away collaborators and tools so you measure one isolated skill: can this person, alone, write correct code under pressure.

That skill is no longer the one the job exercises. Day-to-day engineering in 2026 is a collaboration between an engineer and an agent. The engineer decides what to build, the agent drafts most of the code, and the engineer's real work is reviewing, correcting, and deciding when the agent is confidently wrong. The interview measures solo code production. The job rewards directing a tireless, fast, occasionally hallucinating collaborator. The proxy and the target have come apart, and most hiring pipelines haven't noticed.

This is not a complaint about cheating, though cheating is the symptom everyone fixates on. It's a measurement problem. When you can no longer observe the variable your test was designed to isolate, the test stops producing signal — and a test that produces no signal while everyone still trusts it is worse than no test at all.

Banning the agent tests a job nobody does

The first instinct, when invisible AI assistants started passing interviews, was to lock the room down harder. Tab-switching detection. Browser lockdowns. Eye-tracking. Keystroke analysis to flag the rhythm of pasted code. An entire counter-industry now sells "interview integrity" as a service, scoring twenty-some behavioral signals to catch the candidate who alt-tabs to a hidden overlay.

The arms race is unwinnable, and that's the less interesting reason to abandon it. Cheating tools advertise pass rates north of 90% against standard algorithm problems, and they render solutions in under two seconds while staying invisible to screen-share. Detection tightens, tools adapt, and the cost of the whole exercise keeps rising. But suppose you won the arms race tomorrow — suppose you could perfectly guarantee a candidate solved a graph-traversal problem with no assistance whatsoever. You would have measured, with great precision, a counterfactual job. Nobody writes graph traversals from memory anymore. You'd have certified a skill the candidate will never use again, in an environment that resembles no workplace.

A locked-down interview in 2026 is a flight simulator that has carefully disabled the autopilot — because the last generation of pilots flew without one. It tests competence at a task the role has retired. Banning the agent doesn't restore signal. It just makes the interview measure the wrong thing more reliably.

Allowing the agent without structure tells you nothing either

The opposite move — "use whatever tools you want, we don't care" — feels modern and fails for a subtler reason. When a candidate and an agent jointly produce a correct solution, you cannot see who authored the judgment. The candidate might have decomposed the problem, spotted a flawed first draft, and steered the agent to a better design. Or the candidate might have pasted the prompt, accepted the first output, and gotten lucky because the problem was a common one the model had memorized. Both produce the same green checkmark.

The interview's entire value was that it forced the candidate's reasoning into the open. An unstructured "bring your agent" interview hides that reasoning behind the agent's output. You watch a screen fill with correct code and learn nothing about whether the human in the chair could have caught it if it were wrong. The signal didn't improve; it moved somewhere you can't observe it.

So the two obvious responses — ban the tool, or allow it freely — both destroy signal, just from opposite directions. One measures a skill that's obsolete. The other measures a collaboration without being able to attribute any of it. The way out is not a policy on tools. It's redesigning what the interview asks the candidate to do.

The signals that actually predict performance now

If solo code production no longer predicts on-the-job performance, what does? The companies that have rebuilt their interviews — Sierra, DoorDash, Canva, and others piloting "audit" or "AI-native" formats — converge on a short list of signals that an agent cannot fake on the candidate's behalf.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates