A code review checklist that respects everyone's time
What to check deeply, what to automate, and what to stop debating entirely
Most code reviews find the same things: a missing null check, a variable named poorly, an edge case buried in a conditional. The problem isn't what they find — it's that reviewers spend most of their time on things a linter could catch automatically, leaving little attention for the things only a human can see.
A review process that earns its time starts by being explicit about what code review is actually for.
The three jobs of a code review
Code review serves three distinct functions, and conflating them is the root cause of most slow, frustrating, or ineffective reviews.
Correctness checking. Does the code do what it's supposed to do? Does it handle edge cases? Does it fail gracefully when inputs are unexpected? This is the core purpose of code review and the hardest to automate fully.
Knowledge sharing. Someone learns how the system works by reading the diff and engaging with comments. This is a legitimate side effect of correctness checking — but it's worth naming separately, because it affects who should review what. A junior engineer reviewing a senior engineer's PR learns more from the process than they will from most training sessions. That's valuable.
Style and standards enforcement. Is the code readable? Does it follow the team's conventions? This is the function most susceptible to over-investment, because it feels productive, generates threads that look like thoroughness, and is easy to comment on without deep understanding of the code.
The third function is where most review time gets lost. The fix is to automate it, not to do it better.
Automate the obvious before a human reads anything
A review checklist should come with a prerequisite: before a reviewer opens the diff, automated checks must already have passed.
- A linter has run and found nothing (ESLint, Pylint, golangci-lint, Rubocop)
- A formatter has run (Prettier, Black, gofmt, rustfmt)
- Tests have passed, including integration tests where coverage exists
- Static analysis has run (Semgrep for security patterns, type checking for typed languages)
If reviewers are leaving comments about indentation, trailing commas, variable naming conventions, or import ordering, the CI pipeline isn't doing its job. Fix the pipeline before optimising the review. A linter comment in a PR is a signal that the linter isn't running in CI.
Once the automated layer is in place, the human reviewer is freed to focus on the things that require actual judgment.
Six things that warrant genuine attention
These are the categories where code review consistently catches bugs and improves systems. They're also the categories that require a human.
- Before reading a single line of code, read the PR description and the linked issue. Are they aligned? Sometimes the code is technically correct but solves a slightly different version of the problem than the one originally asked for. This is the most important check and the easiest to skip when you're under time pressure. Does it solve the right problem?
- Every function has inputs it wasn't designed for. Every API endpoint can receive a request that the author didn't anticipate. For each public-facing boundary (a function signature, a database write, an API endpoint), ask: what happens if the input is null, empty, too large, malformed, or adversarial? The answer should be explicit, not accidental. What happens at the boundary?
- Reviewers tend to read through the happy path and skim the catch blocks and early returns. Go back and read them deliberately. Does the error return useful information? Does it leave the system in a consistent state? Does it log enough context to debug later? Does it leak internal implementation details in a response body that ends up in a client? Is the error path as careful as the happy path?
- Any change touching data (write operations, cache invalidation, queue publishing) should be read for consistency risks. Two writes that should be atomic but aren't. A cache that can diverge from the database. A queue message published before the database transaction commits. These bugs are difficult to reproduce in tests and expensive to debug in production. Is there a data consistency risk?
- Can you tell if this code is working correctly in production? Are there structured logs? Can you trace a specific request through the system? If something goes wrong at 3am, will the on-call engineer be able to tell where and why? Observability gaps are easy to miss in review and costly to fix after the fact. Is it observable?
- If this change behaves unexpectedly in production, how many users or systems are affected? A change to a core authentication flow has a different blast radius than a change to a notification preference. The blast radius determines the bar for review thoroughness and whether the change warrants a feature flag. What is the blast radius?
| What you're looking at | Where it belongs |
|---|---|
| Correctness, edge cases | Human review, every PR |
| Boundary conditions | Human review, every PR |
| Error path quality | Human review, every PR |
| Data consistency | Human review, every PR |
| Observability | Human review, every PR |
| Code formatting, indentation | Automated formatter (Prettier, Black, gofmt) |
| Naming conventions | Linter rule or a written standard; not a review comment |
| Import ordering | Automated formatter |
| File location | Architecture discussion before the PR opens, not during review |
| "I would have done it differently" | Skip unless the approach has a correctness or performance consequence |
Five things to stop debating in reviews
These are the categories where review threads accumulate fastest and deliver least value. Each has a better answer than a comment thread.
Naming. If your linter doesn't catch it and your team doesn't have a written convention, don't debate it in a review. If a name is genuinely misleading (not just different from what you would have chosen), say so once, concisely. A twelve-message thread over getUserData versus fetchUserData is not an effective use of anyone's time. Document the convention and enforce it with a linter rule.
Formatting and line length. Your formatter should handle this. If it doesn't, configure it. This is not a reviewable item.
Where to put a file. File organisation decisions happen during planning or architecture review, not in a PR comment. If a file genuinely belongs elsewhere in a way that matters, that's worth one comment and a decision, not a thread.
Library versus custom implementation. This decision should happen before the code is written, in the issue or design discussion. If it wasn't, the comment belongs in the issue thread, not as a blocking note on code that's already written and tested.
Stylistic preferences with no behavioural consequence. "I would have written this differently" is almost never a useful review comment. "This will break when the input is an empty list because of the assumption on line 42" is.
PR size is the variable most teams ignore
Practical limits for reviews that are actually thorough:
- Under 400 lines of diff for changes that will get real review
- Under 200 lines for changes that touch authentication, payment, or security-sensitive paths
- Refactoring PRs should be separate from feature PRs. Mixing them means reviewers can't distinguish a behaviour change from a rename.
If a feature genuinely can't be broken down smaller, the right answer is to ship it behind a feature flag and review the chunks incrementally. A PR that has to be reviewed in full before anything can ship is a process problem, not a code problem.
Async-first, synchronous when it earns it
The default for most teams is async: the author opens a PR, reviewers leave comments on their own time, the author addresses them, reviewers approve. This works well when the PR is small enough to read without excessive context-switching and the reviewer doesn't need to talk through the approach.
The case for a synchronous review (a 30-minute call where author and reviewer go through the diff together) is narrower but real:
- The change is architecturally significant and the async thread has already generated five or more back-and-forth exchanges
- The reviewer and author are at meaningfully different experience levels and the author will learn more from a live conversation
- The team is blocked on this shipping and async is adding days to cycle time
Most teams should default to async and use synchronous selectively. Many teams do the opposite, which is why many reviews feel slow without being thorough.
What changes when AI writes the code
Code written by an AI coding assistant has patterns that human-written code doesn't tend to produce at the same rate:
- Error paths that look reasonable and handle a subtly different case than the one that will actually occur in production
- Confident over-engineering. The code solves a harder version of the problem than necessary, with abstractions that won't survive contact with the actual requirements.
- Context blindness. The code doesn't know what happens before this function runs or after it returns, so functions that look correct in isolation can be wrong in context.
None of this changes the checklist. It changes the priority within it: boundary conditions and error paths deserve more scrutiny when the code was generated, because those are the categories where generation tends to fail most silently. The review is the same review; the emphasis shifts.
The review that earns its time
A code review process that respects everyone's time is not a shorter review. It's a review that spends its time on the things only a human can catch — and delegates everything else to tools. The six areas above have a common thread: they require understanding what the code is supposed to do, what could go wrong, and what the impact would be. A linter can't answer any of those questions. A reviewer can.
The teams that review code most effectively are usually the ones that have been deliberate about what the review is for, have automated everything that can be automated, and have norms around PR size that make the review doable in the first place.
Frequently asked questions
Related reading
Technical debt is a useful shorthand. It's also why nothing gets fixed.
The 'technical debt backlog' meeting that goes nowhere is a vocabulary failure. Most things engineers call technical debt are four distinct problems — and fixing any of them starts with naming the right one.
Feature flags in production: the lifecycle teams skip
Most teams have a system for adding feature flags. Almost none have a system for retiring them. Here is the full lifecycle: flag types, staleness detection, and the cleanup playbook.
Five security patterns that appear in AI-generated code — and why code review usually misses them
AI-generated codebases have 2.5x more critical vulnerabilities than human-written code. The useful finding: five predictable patterns that standard code review is not designed to catch.