Engineering take-homes were already broken. AI just made it obvious.
The signal in a take-home has never been the code. It's time the rubric reflected that.
Most engineering take-home assignments end the same way. The candidate submits clean code, the reviewer skims it for red flags, and the team advances or rejects based on whether the output looks like "good engineering." The code is treated as the signal.
It was always a proxy. A decent engineer with three free evenings produces a different submission than a strong engineer with two hours on a Friday night. Add AI tools and the proxy collapses: a mediocre candidate with good prompting habits now produces code that looks indistinguishable from what a senior engineer would write in an afternoon.
This is not a new problem. It is an old problem that became visible.
What the engineering take-home was actually measuring
The case for take-homes was never really about code quality. It was about three things: does the candidate understand the problem domain well enough to scope it correctly? Do they think about edge cases and error states without being prompted? Can they make reasonable tradeoffs and explain why?
Code quality was a proxy for the second two. And for the most part, it worked. A submission with no error handling and no tests usually indicated a candidate who had not thought carefully. A submission with thoughtful abstractions usually indicated someone who had wrestled with real systems before.
The rubric that emerged: correctness, test coverage, code structure, and some version of "does the reviewer feel good about this?" This held up well enough when four hours of honest work was hard to fake.
The signal collapsed — but not because of AI
When AI tools became widespread, teams started noticing that the bar for "output a senior reviewer feels good about" had become trivially easy to clear. A candidate using Claude or Cursor can produce working code with tests, reasonable abstractions, and a decent README in under an hour. That same candidate, two years earlier, would have needed four hours.
Two failure modes appeared almost immediately. First: candidates with strong AI-tool skills and weak engineering judgment submit polished code that stalls on the first follow-up question. Second: strong candidates with high standards refuse take-homes outright. They know the AI-assisted code would feel dishonest, and a genuine attempt takes a full weekend. Neither failure mode shows up in your rejection statistics. The second one is invisible.
The underlying cause is that output quality was never a stable signal. It was confounded by time invested, familiarity with the specific framework, and whether the candidate had solved a similar problem before. AI made all three confounders trivial to manage.
The wrong fixes
The next instinct is to return to live whiteboards or timed online assessments. This addresses the AI problem by creating different ones. Whiteboard performance does not predict job performance particularly well. Timed assessments filter out candidates who are not good at performing under artificial time pressure, which is not a job requirement. Neither format reflects what engineering work actually looks like.
The problem is not the format. It is the rubric.
The fix: separate the output from the thinking
The take-home format is sound. The post-submission conversation is the assessment.
The change: stop grading the code, and start grading the candidate's ability to reason about the code. After submission, run a 30-minute code review session where the candidate walks through their own work. The questions that produce real signal:
- Walk me through the most important design decision you made.
- Where would you start if you had another two hours?
- What would break first under production load?
- Which edge cases did you handle, and which did you defer?
- Can you add a test for this edge case right now?
These questions have a property the async code review does not: they cannot be answered by AI on behalf of the candidate. Either the candidate understands the code or they do not.
The code becomes a shared artefact to reason about together. Whether the candidate wrote it from scratch, assembled it with AI, or fell somewhere in between is now irrelevant. What matters is whether they can think through it.
The new rubric
| Criterion | Old rubric | New rubric |
|---|---|---|
| Code correctness | Primary signal | Necessary floor, not primary |
| Test coverage | Scored on submission | Discussed: "what did you choose not to test and why?" |
| Code structure | Scored on submission | Discussed: "what would you refactor first?" |
| README | Evaluated for quality | Starting point for scope conversation |
| Error handling | Checked on submission | Discussed: "what failure modes did you consider?" |
| AI tool use | Banned or unclear | Assumed; judgment about AI output is the signal |
| Live change request | Not present | Required — adds a test or fixes an edge case live |
What the session reveals
"If you were onboarding a new engineer to this codebase, where would you start?" shows whether the candidate can see their own work from the outside. Candidates who have genuinely thought through their design answer in thirty seconds. Candidates who produced working code without a coherent mental model either point at their README or start describing the file structure.
"What would you do differently if this were going into production tomorrow?" is an even sharper divide. Strong candidates often answer before you finish the question. They have been thinking about it since they submitted.
The live change request — "can you add a test for this edge case right now?" — is the clearest filter. Five minutes. It separates the candidate who understands what they built from the one who can talk about code in the abstract. An AI-assisted submission the candidate did not read carefully falls apart here almost immediately.
The two failure modes this format addresses: the AI-polished-but-hollow candidate fails the first "why" question. The strong candidate who would have refused a traditional take-home no longer has a reason to refuse. There is no pretence of an AI-free code sample. The assessment is their reasoning, not their output.
One caveat: candidates who are strong verbal communicators and weaker engineers can game the first few questions. The live change request closes most of that gap. Some teams add a second short live session — a 30-minute pairing exercise on a separate small problem. The combination of take-home plus two short live sessions produces more signal per hour than any single-format process.
As AI tools improve, the ceiling on "output that looks senior-level" will keep rising. The ceiling on "can you reason about your own engineering decisions in a real conversation" is nowhere near there yet.
Frequently asked questions
Related reading
The take-home coding assignment is dead. Mostly.
AI didn't kill the take-home coding assignment — it exposed what was always wrong with it. Five alternatives that actually predict whether someone can do the job.
Why most engineering ladders are a single track in disguise
Most engineering ladders have a technical track on paper and a management track in practice. Here are the three structural reasons this happens, and what to change.
The take-home assignment is dead. Mostly.
Take-home coding assignments were designed to measure unassisted engineering work. AI coding tools made that assumption obsolete. Here is what the replacement formats look like.