Should we just ban AI use in take-home assignments?

No. That is unenforceable and the wrong instinct. You want to hire engineers who use AI effectively. A ban selects for candidates who are dishonest about their workflow and filters out people who are honest. The code review format renders the question mostly moot: it is hard to fake a specific, contextual review of unfamiliar code regardless of which tools the candidate used.

How long should a code review take-home take?

30 to 60 minutes of asynchronous work, followed by a 20-minute live follow-up. Good code review is about depth, not duration. A four-hour assignment signals that you value volume over judgment, which is the opposite of what you want to screen for.

What if we still want candidates to build something?

Keep the build task, but treat the submission as a starting point for a live conversation rather than the signal itself. Ask why they made a specific architectural decision, and what they would do differently with more time. The conversation tells you whether they own the code; the code alone no longer does.

Hiring & CultureMay 8, 20265 min readReviewed May 8, 2026

The engineering take-home assignment in 2026: what AI broke, and the one format that still works

Most engineering take-home formats now test AI prompting skill, not engineering judgment. One format is different.

By FlowVerify Editorial Team

The engineering take-home assignment has been the default hiring filter for the last decade. You give candidates a problem, a few days, and a clean environment. They submit code. You read it. It works or it does not; it is tested or it is not; it reveals judgment or it does not.

This still works for one specific format. Most of the others are broken. Here is the mechanism.

Why engineering take-homes worked

The premise was sensible: remove the whiteboard. Let candidates work in their own environment, on their own schedule, with their actual tools. The submission would show what they would actually ship, not what they could write under observation in 40 minutes.

For most of the 2010s, the submission was the signal. Working code meant the candidate could write working code. Structure and test coverage revealed how they thought about software. Edge-case handling distinguished a careful engineer from a careless one. You could not fake a polished submission without doing most of the actual work.

The format was not perfect. Strong communicators who coded slowly were penalised. Candidates with long working hours found multi-day take-homes inaccessible. But the core signal was real: you got to see what the person built.

What shifted in 2023, and accelerated in 2025

In late 2022 and into 2023, LLMs became capable enough to complete most take-home assignments when prompted well. By 2025, with agentic coding tools that run tests and iterate until they pass, a well-specified take-home became a 30-minute task for anyone who knew how to direct the agent.

The submission that took a senior engineer three hours in 2019 now takes an hour of prompting, iteration, and cleanup. The output looks considered. It is tested. It handles edge cases. The README explains trade-offs. The commit history looks believable.

You cannot distinguish a strong submission from a well-prompted one by reading the artifact. The artifact has stopped being a reliable signal.

The formats that lost their signal

Most take-home formats fall into categories that no longer tell you what you need to know.

Build-a-feature tasks. "Implement a simple in-memory key-value store with TTL" or "Build a REST API with authentication and pagination." Any capable LLM can complete these at the level of a competent mid-level engineer. Submission quality now reflects whether the candidate knew how to prompt well, not whether they can reason about code.

Algorithm problems. Anything that can be specified in natural language is solvable. Coding platforms that run take-home assessments have been flagging AI-assisted submissions for two years; unmonitored versions have no such check.

Integration tasks. "Wire up our sandbox API and build a small application on top of it." These take more prompting because the candidate needs to share docs and iterate, but a patient candidate with decent tool use can produce something impressive without owning what they submitted.

The format that still provides signal: code review

Code review asks something different. Instead of "write something that works," it asks: "read this, and tell me what is wrong with it."

Give the candidate a pull request of 200 to 400 lines. Ask them to review it as they would a real PR from a colleague: flag what they would block on, what they would approve, what they would ask about. Run it asynchronously, then spend 20 minutes on a live follow-up.

This format holds for a specific reason: expert code review is precise, and LLM code review is generic.

An LLM reviewing a PR typically produces something like: "Consider adding input validation to prevent unexpected inputs." Technically accurate. Practically useless. An engineer who has debugged this class of bug in production says: "Line 47: user_id is not validated before being passed to the query builder. This ORM version interpolates null as an unquoted string, which causes the query to return all rows for that account. This is a data exposure bug, not a missing validation check."

The specificity is the signal. It comes from pattern recognition built over years of debugging real systems. You cannot generate that level of precision by prompting; you can only produce it if you understand why the code is wrong.

Element	Target	Why it matters
PR size	200 to 350 lines	Long enough to plant real issues; short enough to review in under an hour
Planted issues	4 to 6 intentional problems	Enough signal across categories without turning it into a bug-finding marathon
Issue types	Bug, security, design flaw, performance, subtle correctness error	Tests pattern recognition across distinct dimensions
False positives	At least one thing that looks suspicious but is correct	Surfaces over-flagging; a good reviewer explains why the suspicious code is actually fine
Domain	Close to the role's actual stack	Reduces false negatives from unfamiliarity with a contrived problem domain

What makes a useful code review take-home

The reverse take-home

One specific variant worth building: generate the code with an AI assistant, then introduce specific, intentional problems: a data exposure bug, an off-by-one error, a design decision that will cause pain at scale, a missed edge case.

Give the candidate both the code and the information that it was AI-generated. Ask them to review it as they would any production PR. This is not a trick. It is a realistic simulation of a growing part of the job.

In 2026, reviewing AI-generated code is core engineering work. Engineers who have been using AI as a force multiplier for two years are practised at spotting where it falls short. Engineers who have been using it as a shortcut are not. The reverse take-home separates the two directly.

“Expert code review is precise. LLM code review is generic. That gap is the signal worth measuring.”

— FlowVerify Editorial Team

The follow-up that matters more than the submission

For any format, run a 20 to 30-minute live follow-up. Not to quiz; to calibrate.

For a code review take-home: ask the candidate to walk through the most significant issue they flagged. Ask why it matters in production, not in theory. Ask what they would write in the PR comment. Ask if there are related failure modes they would check for in adjacent code.

An engineer who owns the review they submitted answers in specific terms. An engineer who submitted AI-generated review commentary will give reasons that sound plausible but do not connect to the specific code they were given.

The most useful follow-up question for any take-home format: "Walk me through the last time AI gave you code you did not end up using. What was wrong with it, and how did you figure that out?" The answer tells you whether this person treats AI as a tool they direct or as an oracle they trust.

What you are actually measuring

The framing that makes this cleaner: the skill you are hiring for has shifted.

Before 2023, the core signal you needed was "can this person write working code." The take-home tested that directly. The artifact was the output; the output was the evidence.

Today, writing working code is table stakes. AI handles most of the generation work. The skill that is genuinely scarce is reasoning about code: debugging a system you did not build, understanding why something is wrong, knowing what the AI got wrong in what it generated, making trade-offs that require understanding the full context.

Code review tests this directly. It also happens to be closer to what senior engineers spend most of their time doing.

The take-home assignment is not dead. The version that tested generation is. The version that tests judgment is not.

Frequently asked questions

Bootstrapped or venture-backed: the Indian SaaS calculus in 2026

India hosts the second-largest SaaS ecosystem outside the US. The raise-or-bootstrap question has a different answer in 2026 than it did in 2021. Here's the data behind the shift.

May 14, 2026Read full article →

Hiring & CultureMay 8, 20265 min readReviewed May 8, 2026

The engineering take-home assignment in 2026: what AI broke, and the one format that still works

Most engineering take-home formats now test AI prompting skill, not engineering judgment. One format is different.

By FlowVerify Editorial Team

This still works for one specific format. Most of the others are broken. Here is the mechanism.

Why engineering take-homes worked

What shifted in 2023, and accelerated in 2025

You cannot distinguish a strong submission from a well-prompted one by reading the artifact. The artifact has stopped being a reliable signal.

The formats that lost their signal

Most take-home formats fall into categories that no longer tell you what you need to know.

The format that still provides signal: code review

Code review asks something different. Instead of "write something that works," it asks: "read this, and tell me what is wrong with it."

This format holds for a specific reason: expert code review is precise, and LLM code review is generic.

Element	Target	Why it matters
PR size	200 to 350 lines	Long enough to plant real issues; short enough to review in under an hour
Planted issues	4 to 6 intentional problems	Enough signal across categories without turning it into a bug-finding marathon
Issue types	Bug, security, design flaw, performance, subtle correctness error	Tests pattern recognition across distinct dimensions
False positives	At least one thing that looks suspicious but is correct	Surfaces over-flagging; a good reviewer explains why the suspicious code is actually fine
Domain	Close to the role's actual stack	Reduces false negatives from unfamiliarity with a contrived problem domain

What makes a useful code review take-home

The reverse take-home

“Expert code review is precise. LLM code review is generic. That gap is the signal worth measuring.”

— FlowVerify Editorial Team

The follow-up that matters more than the submission

For any format, run a 20 to 30-minute live follow-up. Not to quiz; to calibrate.

What you are actually measuring

The framing that makes this cleaner: the skill you are hiring for has shifted.

Before 2023, the core signal you needed was "can this person write working code." The take-home tested that directly. The artifact was the output; the output was the evidence.

Code review tests this directly. It also happens to be closer to what senior engineers spend most of their time doing.

The take-home assignment is not dead. The version that tested generation is. The version that tests judgment is not.

The engineering take-home assignment in 2026: what AI broke, and the one format that still works

Why engineering take-homes worked

What shifted in 2023, and accelerated in 2025

The formats that lost their signal

The format that still provides signal: code review

The reverse take-home

The follow-up that matters more than the submission

What you are actually measuring

Frequently asked questions

Related reading

Bootstrapped or venture-backed: the Indian SaaS calculus in 2026

The take-home coding assignment is dead. Mostly.

Engineering take-homes were already broken. AI just made it obvious.

Stay ahead on eSignatures, compliance, and document workflows

Bootstrapped or venture-backed: the Indian SaaS calculus in 2026

The engineering take-home assignment in 2026: what AI broke, and the one format that still works

Why engineering take-homes worked

What shifted in 2023, and accelerated in 2025

The formats that lost their signal

The format that still provides signal: code review

The reverse take-home

The follow-up that matters more than the submission

What you are actually measuring

Frequently asked questions

Related reading

Bootstrapped or venture-backed: the Indian SaaS calculus in 2026

The take-home coding assignment is dead. Mostly.

Engineering take-homes were already broken. AI just made it obvious.

Stay ahead on eSignatures, compliance, and document workflows

Bootstrapped or venture-backed: the Indian SaaS calculus in 2026