Technical Interviews Are Finally Changing: Companies Now Let Candidates Use AI in Live Coding — But What Are You Actually Evaluating?

The hiring landscape just shifted again, and I don’t think most companies have caught up.

Karat’s 2026 Engineering Interview Trends report confirms what many of us have been sensing: a growing number of companies now allow candidates to use AI tools — Copilot, ChatGPT, Cursor — during technical interviews. This isn’t a fringe experiment anymore. It’s becoming policy at companies that collectively hire tens of thousands of engineers per year.

The Rationale Makes Sense (On Paper)

The logic is straightforward. Developers use AI tools daily at work. GitHub’s data shows that over 70% of professional developers use some form of AI coding assistant regularly. Banning these tools in interviews creates an artificial environment that doesn’t reflect how people actually build software. It’s like testing a carpenter’s skill but telling them they can’t use a power drill — sure, you’ll learn something, but is it the right something?

But Here’s the Problem

If a candidate uses Copilot to solve a LeetCode-style problem, what exactly did you learn about their ability? You tested their prompting skill and their ability to accept or reject suggestions. You didn’t test their problem-solving, their ability to reason about edge cases, or their understanding of algorithmic tradeoffs. The interview became a test of a different skill — one that matters, but maybe not the one you thought you were evaluating.

What We Tried

At my company, we ran an experiment for three months. We allowed candidates to use any AI tool during take-home assignments. The theory was that it would mirror real work and surface candidates who knew how to leverage AI effectively.

The result? Every single submission looked polished. Code was clean, well-commented, had test coverage. We couldn’t differentiate between a strong senior engineer and a junior developer who spent extra time prompting. The signal-to-noise ratio went to near zero.

So we pivoted. We switched to live pairing sessions where we watch how candidates use AI, not just the output. This was revelatory. We saw candidates who:

  • Used Copilot as a starting point but then restructured the code based on their own mental model
  • Blindly accepted AI suggestions without reviewing them (red flag)
  • Caught subtle bugs in AI-generated code that would have caused production issues
  • Knew when to turn AI off and reason from first principles

The delta between these behaviors was massive and immediately visible.

The Emerging Interview Patterns

Across my network, I’m seeing three new interview formats gaining traction:

  1. “Explain this AI-generated code” — Give the candidate a block of AI-generated code with subtle issues. Can they identify the problems? Do they understand what it does?
  2. “Debug this AI mistake” — Present code that an LLM produced with a known flaw. Watch how they diagnose and fix it.
  3. “Architect a system (whiteboard, no AI)” — Strip away tools entirely and test raw system design thinking. This is where genuine understanding shows through.

The Speed vs. Quality Tradeoff

Here’s the tension nobody talks about enough: project-based and pairing-based interviews are dramatically better at evaluating candidates. They’re also 3x slower. A take-home assignment plus a pairing session plus a system design whiteboard takes a week of the candidate’s time and 8-10 hours of interviewer time.

Top candidates ghost slow processes. I’ve lost three excellent candidates this quarter to companies with 2-day hiring loops. When the market is competitive, speed is a feature.

My Current Framework

After a lot of iteration, here’s where we’ve landed:

  1. Phone screen (30 min, no AI) — Basic technical conversation, culture fit signal
  2. Live pairing WITH AI tools (90 min) — Real problem, real tools, we observe their process
  3. System design (60 min, whiteboard only, no AI) — Architecture thinking from first principles
  4. Culture and values conversation (45 min) — Cross-functional, not just engineering

Total candidate time commitment: ~4 hours. Total interviewer time: ~6 hours across four people. It’s not perfect, but it’s workable.

The Open Question

I’m still not satisfied with our process, and I don’t know anyone who is. The AI-enabled interviews give us better signal than the old LeetCode grind, but we’re still figuring out what “good” looks like.

Has anyone here found a hiring process that actually works in the AI era? Specifically:

  • How do you calibrate evaluation when candidates have wildly different comfort levels with AI tools?
  • How do you keep the process fast enough to not lose candidates?
  • How do you evaluate AI-native junior developers who’ve never written code without AI assistance?

Genuinely curious what others are seeing. This feels like one of those problems that every company is solving independently when we should be sharing notes.

David, this resonates deeply. I supervise hiring across 8 engineering teams at our company, and the inconsistency you described is the single biggest problem I’m dealing with right now.

The Calibration Nightmare

Here’s what’s happening in practice: some of my interviewers allow AI tools, some don’t. Some candidates ask permission to use them, some just open Copilot without asking, and some deliberately avoid AI to “prove” they can code from scratch. The result? Our interview scores are incomparable across candidates. We’re trying to rank people on the same rubric, but they took fundamentally different tests.

Last quarter, I had two candidates for the same senior role. One solved the coding problem in 20 minutes using Cursor — clean solution, well-tested. The other took 55 minutes writing everything by hand, had a few syntax errors, but demonstrated extraordinary reasoning about edge cases and performance tradeoffs during the process. My interviewers split 50/50 on who was stronger. Both were right, and both were wrong. We were measuring different things.

What I Standardized

I spent three weeks building a structured interview framework with an explicit rubric that evaluates reasoning, not code quality. Here’s the core principle: AI-generated code is fine. I don’t care if Copilot wrote the for loop. I care about:

  1. Can the candidate explain why the solution works? Not just what it does, but why this approach over alternatives. If they can’t articulate the tradeoffs, they didn’t understand it — regardless of who (or what) wrote the code.

  2. Can they identify limitations? Every solution has edge cases, performance boundaries, and assumptions. Strong candidates probe these proactively. Weak candidates say “it works” and move on.

  3. How do they respond when challenged? I train my interviewers to push back on solutions — “What if the input size is 10x larger?” or “What happens when this service is unavailable?” The candidate’s response tells you everything about their depth of understanding.

  4. Do they know when AI is wrong? I’ve started including a segment where we deliberately feed the candidate’s AI tool a misleading prompt and watch what happens. The ones who catch the mistake and explain why it’s wrong are the ones I want on my team.

The Rubric Matters More Than the Tools

My biggest takeaway after 18 years of hiring: the specific tools allowed matter far less than having a consistent, well-calibrated rubric that every interviewer applies the same way. Before AI, we had the same inconsistency problem — some interviewers cared about code style, some about speed, some about communication. AI just made the existing calibration problem more visible.

I’ve also started requiring all interviewers to go through a 2-hour calibration session quarterly. We review recordings of past interviews together and practice scoring them independently, then discuss discrepancies. It’s time-intensive but it’s the only thing that’s actually moved the needle on consistency.

Your framework of phone screen → live pairing → system design → culture fit is solid. I’d add one thing: make sure the rubric for the pairing session explicitly separates “AI usage skill” from “fundamental engineering reasoning.” Both matter, but they need to be scored independently.

Great thread. I want to add a perspective from both sides of the table — I’ve been interviewing at other companies recently while also running interviews at my current company. The contrast is stark.

As a Candidate

When I interview at companies that allow AI tools, the experience is dramatically more realistic and less stressful. I’m not sitting there trying to remember the exact syntax for a binary tree traversal I’d normally just look up. Instead, I’m doing what I actually do at work: thinking about the problem, sketching an approach, using tools to accelerate the implementation, and then reviewing and refining the output.

One company I interviewed at last month had me pair with an engineer on a real feature from their backlog. I had full access to Copilot and could Google anything. The interviewer watched my screen and asked questions throughout. At the end, I felt like they actually understood how I work — and I felt like I actually understood what their codebase looked like. It was the most informative interview I’ve ever done, on both sides.

Compare that to the companies still running timed LeetCode rounds. I’m a senior full-stack engineer with 7 years of experience building production systems, and I’m sitting there trying to implement Dijkstra’s algorithm from memory in 25 minutes. When has that ever been my job?

As an Interviewer

When I run interviews, the candidates who use AI tools well are exactly the ones I want to hire. The meta-skill here isn’t coding from scratch — it’s AI literacy. It’s knowing:

  • When to use AI (boilerplate, syntax, standard patterns) vs. when to think independently (architecture, edge cases, domain-specific logic)
  • How to evaluate AI output critically rather than accepting it blindly
  • When to override or discard AI suggestions because your mental model of the problem tells you the AI is heading in the wrong direction
  • How to debug AI-generated code when it fails subtly

These are the skills that differentiate productive engineers in 2026. A developer who can write a perfect merge sort from memory but can’t effectively use AI tools is going to be slower than a developer who understands the problem deeply and leverages AI for implementation.

My Biggest Red Flag

The single most telling moment in an AI-enabled interview: when the AI gives a wrong answer and the candidate doesn’t notice. I’ve seen candidates accept code that had an off-by-one error, a race condition, or used a deprecated API — all because the code “looked right” and they didn’t actually understand what it was doing.

If you can’t function when your AI tool misleads you, you’re not ready for production engineering. Full stop.

I agree with David’s framework. The combination of AI-enabled pairing plus AI-free system design captures both dimensions well. The key insight is that you need both — AI-augmented execution skills AND foundational reasoning that exists independent of tools.

I appreciate this discussion, but I want to zoom out from the AI question and talk about something that’s been bothering me: the signal-to-noise ratio of the entire hiring process, regardless of whether AI is involved.

The Real Problem Isn’t AI — It’s ROI

My engineering organization spends approximately 200 engineer-hours per month on interviews. That’s roughly 1.2 full-time engineers worth of productivity dedicated to hiring. In a good month, we extend 5-6 offers and hire 3 people.

Let me do the math that nobody wants to do. At a fully-loaded cost of $200/hour for senior engineering time, we’re spending $40,000 per month — roughly $13,000 per hire — just on interviewer time. That doesn’t include recruiter costs, ATS tools, sourcing, or the opportunity cost of building features those engineers could have shipped instead.

And here’s the kicker: our interview-to-success correlation is mediocre at best. When I tracked the performance reviews of engineers we hired over the past two years against their interview scores, the correlation was weak. Some of our top performers barely passed the interview bar. Some of our underperformers aced their interviews. The process is expensive and not particularly predictive.

AI tools in interviews are a rounding error compared to this structural problem.

What I’m Experimenting With

Instead of optimizing the interview process, I’m trying to replace parts of it entirely. Here’s what we’re piloting:

Paid trial projects. Candidates work on a real, scoped problem from our backlog for 2 paid days (we pay $1,500-$2,000 for the trial, depending on level). They have full access to their normal tools — AI assistants, Stack Overflow, documentation, whatever they’d use at work. They push code to a branch, and at the end we do a 1-hour review session where they walk through their decisions.

The results have been transformative:

  • Signal quality is dramatically higher. You see how someone actually works, not how they perform under artificial pressure. You see their commit patterns, how they read existing code, how they handle ambiguity in requirements, whether they ask good questions.
  • AI usage is a non-issue. Of course they use AI — it’s a real work environment. What matters is the quality of the final output and their ability to explain it. This makes the whole “should we allow AI in interviews” debate irrelevant.
  • Candidates love it. Every candidate who’s done the trial — even the ones we didn’t hire — said it was the best interview experience they’ve had. They felt respected because we valued their time enough to pay for it.

The downside: it’s more expensive per candidate ($2K vs. ~$600 in interviewer time for a traditional loop). But we run fewer total candidates through the process because the signal is so much better — we can screen more effectively upfront and only invite strong matches to the trial.

The Speed Question

David, your concern about speed is valid. Our trial process takes about 10 days from first contact to offer. That’s longer than a 2-day sprint loop. We’ve lost a couple candidates to faster processes.

My counter-argument: the candidates we lose to faster processes aren’t necessarily the ones we want. The engineers who value thoughtful evaluation over speed tend to be the ones who care about doing good work — and those are exactly the people I want on my team. Selection bias, used intentionally, can be a feature.

That said, I’m working on compressing the timeline. The trial doesn’t have to be consecutive days, and we can overlap the review session with the culture conversation. I think we can get to 7 days total.

I’m less worried about the AI question in hiring and more worried about whether we’re spending our engineering hours wisely. The best interview process is the one that maximizes signal per hour invested.