If take-homes are unreliable, should I run live coding interviews for every role?

Not necessarily. The take-home plus a mandatory walkthrough is still workable, especially for senior roles where a longer project makes sense. The key change is adding a 30-minute follow-up conversation: what would you change with more time, where did you make a trade-off, what assumption might be wrong? Engineers who relied heavily on AI for the code rarely answer these at any depth. For junior roles, a 45-minute live pair session often produces more signal in less time than a 72-hour assignment.

How do I evaluate AI fluency without just watching someone use a chatbot?

Look for what you would look for in any senior engineer: do they validate the output, catch the wrong assumption in the generated code, and know when the model is right versus when it has hallucinated? A live session with a codebase the candidate has not seen makes this visible within the first ten minutes. The candidate who accepts AI output without checking tells you something different from the one who catches the bug before running it.

What about candidates who say they do not use AI tools at all?

Worth exploring. It does not require interrogating candidates on their workflows, but a claim of never using AI tools in 2026 says something about adaptability. The exception is specific domains — embedded systems, certain safety-critical fields — where the tooling legitimately has not arrived. For most product engineering roles, comfort with AI-assisted development is now a baseline expectation, not a nice-to-have.

Our take-homes have detailed rubrics. Should we discard them?

Keep the rubrics; change what they score. The criteria for good code — readability, error handling, test coverage, documentation choices — do not disappear. What to remove is correctness of implementation as a standalone pass/fail criterion. Replace it with depth of explanation in the follow-up walkthrough. Output without understanding should not clear the bar, regardless of how the output was produced.

Hiring & CultureMay 22, 20265 min readReviewed May 22, 2026

The take-home assignment was always broken. AI just made it obvious.

Reconsidering what technical interviews actually test — and what to do about it in 2026

By FlowVerify Editorial Team

When Anthropic ran their engineering take-home assignments through Claude, the model passed. Not barely. It passed well. The team made the test harder. Claude passed that too. They kept iterating, pushing the starting point deeper into the problem, until engineer Tristan Hume described the outcome in a post: "realism may be a luxury we no longer have."

That sentence is worth reading twice. One of the most technically rigorous engineering organisations in the world concluded that the more their interview resembled actual work, the more their own AI could complete it. The only way to preserve signal was to make the test increasingly unlike the job it was supposed to predict.

If that is the bind, the test was not measuring what it claimed to measure. It probably was not before AI either.

The test seemed scientific. It was not.

Take-home assignments became common at engineering-forward companies in the early 2010s partly as a reaction against whiteboard coding. The complaint about whiteboard interviews was legitimate: writing algorithms on a glass board with an interviewer watching tests performance under observation pressure, not engineering judgment. The take-home felt like a genuine improvement. Candidates worked in their own environment, used their own tools, had real time to think. The output was runnable code you could actually assess: structure, error handling, test coverage, documentation choices.

But the take-home was measuring a narrower thing than it appeared. The actual list:

Familiarity with a language's standard library, recalled under time pressure without reference materials
Speed at translating an ambiguous spec into something functional within a fixed window
Willingness to spend six to eight hours on a task for a company that has not made an offer yet
The particular aesthetic of clean code that happened to match the interviewer's preferences

None of these are worthless. But they are not the same as the judgment, debugging instinct, and system-level thinking that distinguish a good senior engineer from a mediocre one in practice. The take-home was a proxy, rough and somewhat correlated, and it worked well enough that nobody looked carefully at what it was actually correlating with.

The numbers from 2026

A 2026 survey of 400 engineering leaders by Karat found that 71% say AI is making technical skills harder to assess. The same survey found that 62% of organisations still prohibit AI use in interviews, while estimating that over half of candidates use it anyway.

That gap is the whole story. Most companies are running interviews designed on the assumption that candidates will not use AI, in a world where most candidates do. The interview is assessing something other than what the rubric says, and most hiring teams have not updated the rubric.

Google announced in early 2026 that they are allowing candidates to use Gemini during a new code comprehension round: reading, debugging, and optimising existing code with AI assistance. Interviewers explicitly score prompt engineering and output validation as part of the assessment. Sundar Pichai disclosed in April 2026 that 75% of new code at Google is now AI-generated and approved by engineers; the interview should reflect that job. Canva redesigned their engineering interviews in June 2025 to require AI tool use and made questions more complex, ambiguous, and realistic; problems that cannot be solved with a single prompt, requiring iterative thinking, requirement clarification, and trade-off reasoning.

Both companies landed on the same conclusion: the question is not whether AI can write the code. The question is whether the candidate makes good decisions when AI can write the code.

What an interview should be measuring

The useful shift is from "what can candidates do without AI" to "what do good engineers do when AI writes the first draft?"

Three skills are genuinely hard to fake in that context:

System-level judgment. Can the candidate read a proposed architecture and identify the failure mode that will matter in production? This requires understanding the specific system: its throughput requirements, operational burden, and failure modes — which AI cannot supply from a generic prompt.
Debugging instinct. When something is wrong in a non-obvious way, can the candidate identify which layer the problem is at? This surfaces clearly in a live session and is essentially untestable in asynchronous output.
Trade-off reasoning. Given two valid approaches, can the candidate articulate why one is better for this context, and what you would have to believe to prefer the other? Live conversation outperforms any asynchronous format here.

These were never better tested by a take-home than by a conversation. The AI era raised the cost of pretending otherwise.

What to watch in a live session

When Canva redesigned their interviews, they did not just change the format. They changed what interviewers score. The assessment now asks: does the candidate validate AI output, or accept it? Do they catch the mistake the model made, or build on top of it?

In practice, this becomes visible within the first ten minutes of a live session with a codebase the candidate has not seen. Four behaviours carry most of the signal:

Do they read before they write? An engineer who opens the file and starts editing before understanding the context tells you something about how they will work on production systems generally.
Do they ask clarifying questions about constraints, or assume? A candidate who builds toward a full solution without checking requirements has a specific failure mode that will follow them to your codebase.
When the generated code has a bug (and you can ensure one is present), do they notice before running it, or only after?
When you introduce a new constraint midway through, do they reconsider the architecture, or patch the symptom?

None of these questions can be answered from take-home output. All of them are answerable in 45 minutes of watching someone work.

Three formats, ranked honestly

Format	Signal claimed	Signal actually produced	2026 durability
72-hour solo take-home	Real-world independent problem-solving	Implementation speed; library familiarity; time-box tolerance	✗ Broken without a walkthrough
LeetCode / whiteboard	Algorithmic thinking under conditions	Pattern recall under observation pressure	✗ Was always broken
Take-home + mandatory walkthrough	End-to-end capability	Output depth and conceptual understanding	~ Workable for senior roles
Live pair — code comprehension	Practical coding skill	Thinking process; debugging instinct; AI fluency	✓ Strong
Architecture discussion	System design ability	Judgment under introduced constraints	✓ Strong

Engineering interview formats in the AI era

The live formats hold up because the signal is in the process, not the output. AI can produce the output; it cannot show whether the candidate read before writing, asked good questions, or noticed the thing that would break in production.

The take-home with a mandatory walkthrough is a reasonable middle ground for senior roles. The walkthrough is not a quiz. It is a 30-minute conversation about choices: what would you change with more time? Where did you make a trade-off you are not confident about? What assumption might be wrong? Engineers who leaned heavily on AI for the code rarely answer at any depth. Engineers who understand what they built can answer easily, regardless of how the first draft was produced.

What has not changed

The interviews that survive the AI era are the ones that were always measuring the right thing: how someone thinks, not what they can produce in isolation. Judgment under constraint, debugging instinct, and the ability to reason about trade-offs were always better assessed in conversation than in asynchronous output.

The engineers worth hiring in 2026 are the same type of engineers who were worth hiring in 2019: people who understand the system they are building, ask good questions, know where their assumptions are, and debug at the right layer of abstraction. None of that changed. The cost of using a format that does not test any of those things has just gone up enough to notice.

Frequently asked questions

Take-home coding assignments hit a 48% AI-cheating rate. Live coding fixes the wrong half of it.

Cheating on take-home coding tests is near 48% for technical roles. The rush back to camera-monitored live coding solves the cheating problem while breaking a different one.

Jul 6, 2026Read full article →

Hiring & CultureMay 22, 20265 min readReviewed May 22, 2026

The take-home assignment was always broken. AI just made it obvious.

Reconsidering what technical interviews actually test — and what to do about it in 2026

By FlowVerify Editorial Team

If that is the bind, the test was not measuring what it claimed to measure. It probably was not before AI either.

The test seemed scientific. It was not.

But the take-home was measuring a narrower thing than it appeared. The actual list:

Familiarity with a language's standard library, recalled under time pressure without reference materials
Speed at translating an ambiguous spec into something functional within a fixed window
Willingness to spend six to eight hours on a task for a company that has not made an offer yet
The particular aesthetic of clean code that happened to match the interviewer's preferences

The numbers from 2026

Both companies landed on the same conclusion: the question is not whether AI can write the code. The question is whether the candidate makes good decisions when AI can write the code.

What an interview should be measuring

The useful shift is from "what can candidates do without AI" to "what do good engineers do when AI writes the first draft?"

Three skills are genuinely hard to fake in that context:

System-level judgment. Can the candidate read a proposed architecture and identify the failure mode that will matter in production? This requires understanding the specific system: its throughput requirements, operational burden, and failure modes — which AI cannot supply from a generic prompt.
Debugging instinct. When something is wrong in a non-obvious way, can the candidate identify which layer the problem is at? This surfaces clearly in a live session and is essentially untestable in asynchronous output.
Trade-off reasoning. Given two valid approaches, can the candidate articulate why one is better for this context, and what you would have to believe to prefer the other? Live conversation outperforms any asynchronous format here.

These were never better tested by a take-home than by a conversation. The AI era raised the cost of pretending otherwise.

What to watch in a live session

In practice, this becomes visible within the first ten minutes of a live session with a codebase the candidate has not seen. Four behaviours carry most of the signal:

Do they read before they write? An engineer who opens the file and starts editing before understanding the context tells you something about how they will work on production systems generally.
Do they ask clarifying questions about constraints, or assume? A candidate who builds toward a full solution without checking requirements has a specific failure mode that will follow them to your codebase.
When the generated code has a bug (and you can ensure one is present), do they notice before running it, or only after?
When you introduce a new constraint midway through, do they reconsider the architecture, or patch the symptom?

None of these questions can be answered from take-home output. All of them are answerable in 45 minutes of watching someone work.

Three formats, ranked honestly

Format	Signal claimed	Signal actually produced	2026 durability
72-hour solo take-home	Real-world independent problem-solving	Implementation speed; library familiarity; time-box tolerance	✗ Broken without a walkthrough
LeetCode / whiteboard	Algorithmic thinking under conditions	Pattern recall under observation pressure	✗ Was always broken
Take-home + mandatory walkthrough	End-to-end capability	Output depth and conceptual understanding	~ Workable for senior roles
Live pair — code comprehension	Practical coding skill	Thinking process; debugging instinct; AI fluency	✓ Strong
Architecture discussion	System design ability	Judgment under introduced constraints	✓ Strong

Engineering interview formats in the AI era

The take-home assignment was always broken. AI just made it obvious.

The test seemed scientific. It was not.

The numbers from 2026

What an interview should be measuring

What to watch in a live session

Three formats, ranked honestly

What has not changed

Frequently asked questions

Related reading

Take-home coding assignments hit a 48% AI-cheating rate. Live coding fixes the wrong half of it.

Founder-led sales until when, exactly? The unit economics that tell you when to hire

Microsoft says AI is hollowing out junior engineers. The senior shortage lands in the early 2030s.

Stay ahead on eSignatures, compliance, and document workflows

Take-home coding assignments hit a 48% AI-cheating rate. Live coding fixes the wrong half of it.

The take-home assignment was always broken. AI just made it obvious.

The test seemed scientific. It was not.

The numbers from 2026

What an interview should be measuring

What to watch in a live session

Three formats, ranked honestly

What has not changed

Frequently asked questions

Related reading

Take-home coding assignments hit a 48% AI-cheating rate. Live coding fixes the wrong half of it.

Founder-led sales until when, exactly? The unit economics that tell you when to hire

Microsoft says AI is hollowing out junior engineers. The senior shortage lands in the early 2030s.

Stay ahead on eSignatures, compliance, and document workflows

Take-home coding assignments hit a 48% AI-cheating rate. Live coding fixes the wrong half of it.