The engineering take-home assignment in 2026: what AI broke, and the one format that still works
Most engineering take-home formats now test AI prompting skill, not engineering judgment. One format is different.
The engineering take-home assignment has been the default hiring filter for the last decade. You give candidates a problem, a few days, and a clean environment. They submit code. You read it. It works or it does not; it is tested or it is not; it reveals judgment or it does not.
This still works for one specific format. Most of the others are broken. Here is the mechanism.
Why engineering take-homes worked
The premise was sensible: remove the whiteboard. Let candidates work in their own environment, on their own schedule, with their actual tools. The submission would show what they would actually ship, not what they could write under observation in 40 minutes.
For most of the 2010s, the submission was the signal. Working code meant the candidate could write working code. Structure and test coverage revealed how they thought about software. Edge-case handling distinguished a careful engineer from a careless one. You could not fake a polished submission without doing most of the actual work.
The format was not perfect. Strong communicators who coded slowly were penalised. Candidates with long working hours found multi-day take-homes inaccessible. But the core signal was real: you got to see what the person built.
What shifted in 2023, and accelerated in 2025
In late 2022 and into 2023, LLMs became capable enough to complete most take-home assignments when prompted well. By 2025, with agentic coding tools that run tests and iterate until they pass, a well-specified take-home became a 30-minute task for anyone who knew how to direct the agent.
The submission that took a senior engineer three hours in 2019 now takes an hour of prompting, iteration, and cleanup. The output looks considered. It is tested. It handles edge cases. The README explains trade-offs. The commit history looks believable.
You cannot distinguish a strong submission from a well-prompted one by reading the artifact. The artifact has stopped being a reliable signal.
The formats that lost their signal
Most take-home formats fall into categories that no longer tell you what you need to know.
Build-a-feature tasks. "Implement a simple in-memory key-value store with TTL" or "Build a REST API with authentication and pagination." Any capable LLM can complete these at the level of a competent mid-level engineer. Submission quality now reflects whether the candidate knew how to prompt well, not whether they can reason about code.
Algorithm problems. Anything that can be specified in natural language is solvable. Coding platforms that run take-home assessments have been flagging AI-assisted submissions for two years; unmonitored versions have no such check.
Integration tasks. "Wire up our sandbox API and build a small application on top of it." These take more prompting because the candidate needs to share docs and iterate, but a patient candidate with decent tool use can produce something impressive without owning what they submitted.
The format that still provides signal: code review
Code review asks something different. Instead of "write something that works," it asks: "read this, and tell me what is wrong with it."
Give the candidate a pull request of 200 to 400 lines. Ask them to review it as they would a real PR from a colleague: flag what they would block on, what they would approve, what they would ask about. Run it asynchronously, then spend 20 minutes on a live follow-up.
This format holds for a specific reason: expert code review is precise, and LLM code review is generic.
An LLM reviewing a PR typically produces something like: "Consider adding input validation to prevent unexpected inputs." Technically accurate. Practically useless. An engineer who has debugged this class of bug in production says: "Line 47: user_id is not validated before being passed to the query builder. This ORM version interpolates null as an unquoted string, which causes the query to return all rows for that account. This is a data exposure bug, not a missing validation check."
The specificity is the signal. It comes from pattern recognition built over years of debugging real systems. You cannot generate that level of precision by prompting; you can only produce it if you understand why the code is wrong.
| Element | Target | Why it matters |
|---|---|---|
| PR size | 200 to 350 lines | Long enough to plant real issues; short enough to review in under an hour |
| Planted issues | 4 to 6 intentional problems | Enough signal across categories without turning it into a bug-finding marathon |
| Issue types | Bug, security, design flaw, performance, subtle correctness error | Tests pattern recognition across distinct dimensions |
| False positives | At least one thing that looks suspicious but is correct | Surfaces over-flagging; a good reviewer explains why the suspicious code is actually fine |
| Domain | Close to the role's actual stack | Reduces false negatives from unfamiliarity with a contrived problem domain |
The reverse take-home
One specific variant worth building: generate the code with an AI assistant, then introduce specific, intentional problems: a data exposure bug, an off-by-one error, a design decision that will cause pain at scale, a missed edge case.
Give the candidate both the code and the information that it was AI-generated. Ask them to review it as they would any production PR. This is not a trick. It is a realistic simulation of a growing part of the job.
In 2026, reviewing AI-generated code is core engineering work. Engineers who have been using AI as a force multiplier for two years are practised at spotting where it falls short. Engineers who have been using it as a shortcut are not. The reverse take-home separates the two directly.
“Expert code review is precise. LLM code review is generic. That gap is the signal worth measuring.”
The follow-up that matters more than the submission
For any format, run a 20 to 30-minute live follow-up. Not to quiz; to calibrate.
For a code review take-home: ask the candidate to walk through the most significant issue they flagged. Ask why it matters in production, not in theory. Ask what they would write in the PR comment. Ask if there are related failure modes they would check for in adjacent code.
An engineer who owns the review they submitted answers in specific terms. An engineer who submitted AI-generated review commentary will give reasons that sound plausible but do not connect to the specific code they were given.
The most useful follow-up question for any take-home format: "Walk me through the last time AI gave you code you did not end up using. What was wrong with it, and how did you figure that out?" The answer tells you whether this person treats AI as a tool they direct or as an oracle they trust.
What you are actually measuring
The framing that makes this cleaner: the skill you are hiring for has shifted.
Before 2023, the core signal you needed was "can this person write working code." The take-home tested that directly. The artifact was the output; the output was the evidence.
Today, writing working code is table stakes. AI handles most of the generation work. The skill that is genuinely scarce is reasoning about code: debugging a system you did not build, understanding why something is wrong, knowing what the AI got wrong in what it generated, making trade-offs that require understanding the full context.
Code review tests this directly. It also happens to be closer to what senior engineers spend most of their time doing.
The take-home assignment is not dead. The version that tested generation is. The version that tests judgment is not.
Frequently asked questions
Related reading
Bootstrapped or venture-backed: the Indian SaaS calculus in 2026
India hosts the second-largest SaaS ecosystem outside the US. The raise-or-bootstrap question has a different answer in 2026 than it did in 2021. Here's the data behind the shift.
The take-home coding assignment is dead. Mostly.
AI didn't kill the take-home coding assignment — it exposed what was always wrong with it. Five alternatives that actually predict whether someone can do the job.
Engineering take-homes were already broken. AI just made it obvious.
Most engineering take-homes broke when AI tools arrived. But the format was already measuring the wrong thing. Here's how to redesign the rubric so the assessment holds up.