The take-home assignment was always broken. AI just made it obvious.
Reconsidering what technical interviews actually test — and what to do about it in 2026
When Anthropic ran their engineering take-home assignments through Claude, the model passed. Not barely. It passed well. The team made the test harder. Claude passed that too. They kept iterating, pushing the starting point deeper into the problem, until engineer Tristan Hume described the outcome in a post: "realism may be a luxury we no longer have."
That sentence is worth reading twice. One of the most technically rigorous engineering organisations in the world concluded that the more their interview resembled actual work, the more their own AI could complete it. The only way to preserve signal was to make the test increasingly unlike the job it was supposed to predict.
If that is the bind, the test was not measuring what it claimed to measure. It probably was not before AI either.
The test seemed scientific. It was not.
Take-home assignments became common at engineering-forward companies in the early 2010s partly as a reaction against whiteboard coding. The complaint about whiteboard interviews was legitimate: writing algorithms on a glass board with an interviewer watching tests performance under observation pressure, not engineering judgment. The take-home felt like a genuine improvement. Candidates worked in their own environment, used their own tools, had real time to think. The output was runnable code you could actually assess: structure, error handling, test coverage, documentation choices.
But the take-home was measuring a narrower thing than it appeared. The actual list:
- Familiarity with a language's standard library, recalled under time pressure without reference materials
- Speed at translating an ambiguous spec into something functional within a fixed window
- Willingness to spend six to eight hours on a task for a company that has not made an offer yet
- The particular aesthetic of clean code that happened to match the interviewer's preferences
None of these are worthless. But they are not the same as the judgment, debugging instinct, and system-level thinking that distinguish a good senior engineer from a mediocre one in practice. The take-home was a proxy, rough and somewhat correlated, and it worked well enough that nobody looked carefully at what it was actually correlating with.
The numbers from 2026
A 2026 survey of 400 engineering leaders by Karat found that 71% say AI is making technical skills harder to assess. The same survey found that 62% of organisations still prohibit AI use in interviews, while estimating that over half of candidates use it anyway.
That gap is the whole story. Most companies are running interviews designed on the assumption that candidates will not use AI, in a world where most candidates do. The interview is assessing something other than what the rubric says, and most hiring teams have not updated the rubric.
Google announced in early 2026 that they are allowing candidates to use Gemini during a new code comprehension round: reading, debugging, and optimising existing code with AI assistance. Interviewers explicitly score prompt engineering and output validation as part of the assessment. Sundar Pichai disclosed in April 2026 that 75% of new code at Google is now AI-generated and approved by engineers; the interview should reflect that job. Canva redesigned their engineering interviews in June 2025 to require AI tool use and made questions more complex, ambiguous, and realistic; problems that cannot be solved with a single prompt, requiring iterative thinking, requirement clarification, and trade-off reasoning.
Both companies landed on the same conclusion: the question is not whether AI can write the code. The question is whether the candidate makes good decisions when AI can write the code.
What an interview should be measuring
The useful shift is from "what can candidates do without AI" to "what do good engineers do when AI writes the first draft?"
Three skills are genuinely hard to fake in that context:
- System-level judgment. Can the candidate read a proposed architecture and identify the failure mode that will matter in production? This requires understanding the specific system: its throughput requirements, operational burden, and failure modes — which AI cannot supply from a generic prompt.
- Debugging instinct. When something is wrong in a non-obvious way, can the candidate identify which layer the problem is at? This surfaces clearly in a live session and is essentially untestable in asynchronous output.
- Trade-off reasoning. Given two valid approaches, can the candidate articulate why one is better for this context, and what you would have to believe to prefer the other? Live conversation outperforms any asynchronous format here.
These were never better tested by a take-home than by a conversation. The AI era raised the cost of pretending otherwise.
What to watch in a live session
When Canva redesigned their interviews, they did not just change the format. They changed what interviewers score. The assessment now asks: does the candidate validate AI output, or accept it? Do they catch the mistake the model made, or build on top of it?
In practice, this becomes visible within the first ten minutes of a live session with a codebase the candidate has not seen. Four behaviours carry most of the signal:
- Do they read before they write? An engineer who opens the file and starts editing before understanding the context tells you something about how they will work on production systems generally.
- Do they ask clarifying questions about constraints, or assume? A candidate who builds toward a full solution without checking requirements has a specific failure mode that will follow them to your codebase.
- When the generated code has a bug (and you can ensure one is present), do they notice before running it, or only after?
- When you introduce a new constraint midway through, do they reconsider the architecture, or patch the symptom?
None of these questions can be answered from take-home output. All of them are answerable in 45 minutes of watching someone work.
Three formats, ranked honestly
| Format | Signal claimed | Signal actually produced | 2026 durability |
|---|---|---|---|
| 72-hour solo take-home | Real-world independent problem-solving | Implementation speed; library familiarity; time-box tolerance | ✗ Broken without a walkthrough |
| LeetCode / whiteboard | Algorithmic thinking under conditions | Pattern recall under observation pressure | ✗ Was always broken |
| Take-home + mandatory walkthrough | End-to-end capability | Output depth and conceptual understanding | ~ Workable for senior roles |
| Live pair — code comprehension | Practical coding skill | Thinking process; debugging instinct; AI fluency | ✓ Strong |
| Architecture discussion | System design ability | Judgment under introduced constraints | ✓ Strong |
The live formats hold up because the signal is in the process, not the output. AI can produce the output; it cannot show whether the candidate read before writing, asked good questions, or noticed the thing that would break in production.
The take-home with a mandatory walkthrough is a reasonable middle ground for senior roles. The walkthrough is not a quiz. It is a 30-minute conversation about choices: what would you change with more time? Where did you make a trade-off you are not confident about? What assumption might be wrong? Engineers who leaned heavily on AI for the code rarely answer at any depth. Engineers who understand what they built can answer easily, regardless of how the first draft was produced.
What has not changed
The interviews that survive the AI era are the ones that were always measuring the right thing: how someone thinks, not what they can produce in isolation. Judgment under constraint, debugging instinct, and the ability to reason about trade-offs were always better assessed in conversation than in asynchronous output.
The engineers worth hiring in 2026 are the same type of engineers who were worth hiring in 2019: people who understand the system they are building, ask good questions, know where their assumptions are, and debug at the right layer of abstraction. None of that changed. The cost of using a format that does not test any of those things has just gone up enough to notice.
Frequently asked questions
Related reading
What your first 50 B2B customers actually teach you — and four things they get wrong
The first 50 B2B customers feel like proof. They're paying, engaged, and generous with feedback. But the sample is systematically biased in ways that will cost you if you optimize for it.
Indian enterprise procurement has four stages. Stage two is where most SaaS deals die.
Indian enterprise SaaS deals fail in pilot limbo more often than at any other stage. Here are the four internal stages of Indian enterprise procurement, and where each one breaks.
Founder-led sales: the signal that matters more than ARR
Most advice on ending founder-led sales focuses on ARR milestones. The real trigger is simpler: whether you can write down why you win precisely enough for someone else to use that knowledge.