Should candidates be allowed to use AI tools on take-home assignments?

Yes. Banning AI creates a test for compliance rather than engineering ability. The better move is to redesign the rubric so the primary assessment happens in a live code review conversation after submission — something that cannot be delegated to AI.

How long should the post-submission code review session be?

Thirty minutes is enough. The first two questions — what was the hardest design decision, and what would break first under production load — usually tell you more than the previous hour of async review. If you need longer, the brief was too open-ended.

What if a candidate submits AI-generated code they do not understand?

The code review session surfaces this within the first five minutes. A candidate who cannot explain a specific design decision, or who hesitates when asked to make a small live change, is showing you exactly the signal you needed. The format works because it cannot be faked.

Is this format fair to candidates who are less confident speaking about their work?

More so than a live whiteboard, because the session is a code review of their own work rather than an abstract problem. Candidates who understood what they built have a clear starting point. The live change request closes the gap further by focusing on execution rather than presentation.

Hiring & CultureMay 14, 20264 min readReviewed May 14, 2026

Engineering take-homes were already broken. AI just made it obvious.

The signal in a take-home has never been the code. It's time the rubric reflected that.

By FlowVerify Editorial Team

Most engineering take-home assignments end the same way. The candidate submits clean code, the reviewer skims it for red flags, and the team advances or rejects based on whether the output looks like "good engineering." The code is treated as the signal.

It was always a proxy. A decent engineer with three free evenings produces a different submission than a strong engineer with two hours on a Friday night. Add AI tools and the proxy collapses: a mediocre candidate with good prompting habits now produces code that looks indistinguishable from what a senior engineer would write in an afternoon.

This is not a new problem. It is an old problem that became visible.

What the engineering take-home was actually measuring

The case for take-homes was never really about code quality. It was about three things: does the candidate understand the problem domain well enough to scope it correctly? Do they think about edge cases and error states without being prompted? Can they make reasonable tradeoffs and explain why?

Code quality was a proxy for the second two. And for the most part, it worked. A submission with no error handling and no tests usually indicated a candidate who had not thought carefully. A submission with thoughtful abstractions usually indicated someone who had wrestled with real systems before.

The rubric that emerged: correctness, test coverage, code structure, and some version of "does the reviewer feel good about this?" This held up well enough when four hours of honest work was hard to fake.

The signal collapsed — but not because of AI

When AI tools became widespread, teams started noticing that the bar for "output a senior reviewer feels good about" had become trivially easy to clear. A candidate using Claude or Cursor can produce working code with tests, reasonable abstractions, and a decent README in under an hour. That same candidate, two years earlier, would have needed four hours.

Two failure modes appeared almost immediately. First: candidates with strong AI-tool skills and weak engineering judgment submit polished code that stalls on the first follow-up question. Second: strong candidates with high standards refuse take-homes outright. They know the AI-assisted code would feel dishonest, and a genuine attempt takes a full weekend. Neither failure mode shows up in your rejection statistics. The second one is invisible.

The underlying cause is that output quality was never a stable signal. It was confounded by time invested, familiarity with the specific framework, and whether the candidate had solved a similar problem before. AI made all three confounders trivial to manage.

The wrong fixes

The next instinct is to return to live whiteboards or timed online assessments. This addresses the AI problem by creating different ones. Whiteboard performance does not predict job performance particularly well. Timed assessments filter out candidates who are not good at performing under artificial time pressure, which is not a job requirement. Neither format reflects what engineering work actually looks like.

The problem is not the format. It is the rubric.

The fix: separate the output from the thinking

The take-home format is sound. The post-submission conversation is the assessment.

The change: stop grading the code, and start grading the candidate's ability to reason about the code. After submission, run a 30-minute code review session where the candidate walks through their own work. The questions that produce real signal:

Walk me through the most important design decision you made.
Where would you start if you had another two hours?
What would break first under production load?
Which edge cases did you handle, and which did you defer?
Can you add a test for this edge case right now?

These questions have a property the async code review does not: they cannot be answered by AI on behalf of the candidate. Either the candidate understands the code or they do not.

The code becomes a shared artefact to reason about together. Whether the candidate wrote it from scratch, assembled it with AI, or fell somewhere in between is now irrelevant. What matters is whether they can think through it.

The new rubric

Criterion	Old rubric	New rubric
Code correctness	Primary signal	Necessary floor, not primary
Test coverage	Scored on submission	Discussed: "what did you choose not to test and why?"
Code structure	Scored on submission	Discussed: "what would you refactor first?"
README	Evaluated for quality	Starting point for scope conversation
Error handling	Checked on submission	Discussed: "what failure modes did you consider?"
AI tool use	Banned or unclear	Assumed; judgment about AI output is the signal
Live change request	Not present	Required — adds a test or fixes an edge case live

Old rubric vs new rubric for engineering take-homes

What the session reveals

"If you were onboarding a new engineer to this codebase, where would you start?" shows whether the candidate can see their own work from the outside. Candidates who have genuinely thought through their design answer in thirty seconds. Candidates who produced working code without a coherent mental model either point at their README or start describing the file structure.

"What would you do differently if this were going into production tomorrow?" is an even sharper divide. Strong candidates often answer before you finish the question. They have been thinking about it since they submitted.

The live change request — "can you add a test for this edge case right now?" — is the clearest filter. Five minutes. It separates the candidate who understands what they built from the one who can talk about code in the abstract. An AI-assisted submission the candidate did not read carefully falls apart here almost immediately.

The two failure modes this format addresses: the AI-polished-but-hollow candidate fails the first "why" question. The strong candidate who would have refused a traditional take-home no longer has a reason to refuse. There is no pretence of an AI-free code sample. The assessment is their reasoning, not their output.

One caveat: candidates who are strong verbal communicators and weaker engineers can game the first few questions. The live change request closes most of that gap. Some teams add a second short live session — a 30-minute pairing exercise on a separate small problem. The combination of take-home plus two short live sessions produces more signal per hour than any single-format process.

As AI tools improve, the ceiling on "output that looks senior-level" will keep rising. The ceiling on "can you reason about your own engineering decisions in a real conversation" is nowhere near there yet.

Frequently asked questions

The take-home coding assignment is dead. Mostly.

AI didn't kill the take-home coding assignment — it exposed what was always wrong with it. Five alternatives that actually predict whether someone can do the job.

May 14, 2026Read full article →

Hiring & CultureMay 14, 20264 min readReviewed May 14, 2026

Engineering take-homes were already broken. AI just made it obvious.

The signal in a take-home has never been the code. It's time the rubric reflected that.

By FlowVerify Editorial Team

This is not a new problem. It is an old problem that became visible.

What the engineering take-home was actually measuring

The signal collapsed — but not because of AI

The wrong fixes

The problem is not the format. It is the rubric.

The fix: separate the output from the thinking

The take-home format is sound. The post-submission conversation is the assessment.

Walk me through the most important design decision you made.
Where would you start if you had another two hours?
What would break first under production load?
Which edge cases did you handle, and which did you defer?
Can you add a test for this edge case right now?

These questions have a property the async code review does not: they cannot be answered by AI on behalf of the candidate. Either the candidate understands the code or they do not.

The new rubric

Criterion	Old rubric	New rubric
Code correctness	Primary signal	Necessary floor, not primary
Test coverage	Scored on submission	Discussed: "what did you choose not to test and why?"
Code structure	Scored on submission	Discussed: "what would you refactor first?"
README	Evaluated for quality	Starting point for scope conversation
Error handling	Checked on submission	Discussed: "what failure modes did you consider?"
AI tool use	Banned or unclear	Assumed; judgment about AI output is the signal
Live change request	Not present	Required — adds a test or fixes an edge case live

Old rubric vs new rubric for engineering take-homes

Engineering take-homes were already broken. AI just made it obvious.

What the engineering take-home was actually measuring

The signal collapsed — but not because of AI

The wrong fixes

The fix: separate the output from the thinking

The new rubric

What the session reveals

Frequently asked questions

Related reading

The take-home coding assignment is dead. Mostly.

Why most engineering ladders are a single track in disguise

The take-home assignment is dead. Mostly.

Stay ahead on eSignatures, compliance, and document workflows

The take-home coding assignment is dead. Mostly.

Engineering take-homes were already broken. AI just made it obvious.

What the engineering take-home was actually measuring

The signal collapsed — but not because of AI

The wrong fixes

The fix: separate the output from the thinking

The new rubric

What the session reveals

Frequently asked questions

Related reading

The take-home coding assignment is dead. Mostly.

Why most engineering ladders are a single track in disguise

The take-home assignment is dead. Mostly.

Stay ahead on eSignatures, compliance, and document workflows

The take-home coding assignment is dead. Mostly.