How long should a thorough code review take?

For a PR under 200 lines: 15 to 30 minutes for an attentive review. For 200–400 lines: 30 to 60 minutes. Past 400 lines, you're past the point where attention is reliable. The right answer for large PRs is to break them up, not to schedule a longer review block.

When should we do a synchronous review instead of async comments?

When the async thread has already produced five or more back-and-forth exchanges, when the change is architecturally significant and the author needs real-time feedback to proceed, or when the team is blocked and async is adding days rather than hours. Synchronous reviews are a useful escalation, not the default.

Does code review actually catch most bugs before production?

It depends on what kind of bugs. Code review is effective at catching logic errors, edge-case omissions, error-path problems, and data consistency risks — the categories in the checklist above. It's less effective at finding concurrency bugs, performance regressions under real load, and integration failures. Those require testing, load testing, and staging environments. Review and testing are complements, not substitutes.

Should we use AI tools to do code review?

AI code review tools (GitHub Copilot Review, CodeRabbit, others) are useful for catching categories like missing error handling, insecure patterns, and obvious logic gaps. They're less useful for the things that require understanding the broader system — blast radius assessment, data consistency, whether the PR solves the right problem. Using an AI tool as a first pass before a human review is reasonable; using it as a replacement for human review on significant changes is not.

Best PracticesMay 26, 20267 min readReviewed May 26, 2026

A code review checklist that respects everyone's time

What to check deeply, what to automate, and what to stop debating entirely

By FlowVerify Editorial Team

Most code reviews find the same things: a missing null check, a variable named poorly, an edge case buried in a conditional. The problem isn't what they find — it's that reviewers spend most of their time on things a linter could catch automatically, leaving little attention for the things only a human can see.

A review process that earns its time starts by being explicit about what code review is actually for.

The three jobs of a code review

Code review serves three distinct functions, and conflating them is the root cause of most slow, frustrating, or ineffective reviews.

Correctness checking. Does the code do what it's supposed to do? Does it handle edge cases? Does it fail gracefully when inputs are unexpected? This is the core purpose of code review and the hardest to automate fully.

Knowledge sharing. Someone learns how the system works by reading the diff and engaging with comments. This is a legitimate side effect of correctness checking — but it's worth naming separately, because it affects who should review what. A junior engineer reviewing a senior engineer's PR learns more from the process than they will from most training sessions. That's valuable.

Style and standards enforcement. Is the code readable? Does it follow the team's conventions? This is the function most susceptible to over-investment, because it feels productive, generates threads that look like thoroughness, and is easy to comment on without deep understanding of the code.

The third function is where most review time gets lost. The fix is to automate it, not to do it better.

Automate the obvious before a human reads anything

A review checklist should come with a prerequisite: before a reviewer opens the diff, automated checks must already have passed.

A linter has run and found nothing (ESLint, Pylint, golangci-lint, Rubocop)
A formatter has run (Prettier, Black, gofmt, rustfmt)
Tests have passed, including integration tests where coverage exists
Static analysis has run (Semgrep for security patterns, type checking for typed languages)

If reviewers are leaving comments about indentation, trailing commas, variable naming conventions, or import ordering, the CI pipeline isn't doing its job. Fix the pipeline before optimising the review. A linter comment in a PR is a signal that the linter isn't running in CI.

Once the automated layer is in place, the human reviewer is freed to focus on the things that require actual judgment.

Six things that warrant genuine attention

These are the categories where code review consistently catches bugs and improves systems. They're also the categories that require a human.

Before reading a single line of code, read the PR description and the linked issue. Are they aligned? Sometimes the code is technically correct but solves a slightly different version of the problem than the one originally asked for. This is the most important check and the easiest to skip when you're under time pressure. Does it solve the right problem?
Every function has inputs it wasn't designed for. Every API endpoint can receive a request that the author didn't anticipate. For each public-facing boundary (a function signature, a database write, an API endpoint), ask: what happens if the input is null, empty, too large, malformed, or adversarial? The answer should be explicit, not accidental. What happens at the boundary?
Reviewers tend to read through the happy path and skim the catch blocks and early returns. Go back and read them deliberately. Does the error return useful information? Does it leave the system in a consistent state? Does it log enough context to debug later? Does it leak internal implementation details in a response body that ends up in a client? Is the error path as careful as the happy path?
Any change touching data (write operations, cache invalidation, queue publishing) should be read for consistency risks. Two writes that should be atomic but aren't. A cache that can diverge from the database. A queue message published before the database transaction commits. These bugs are difficult to reproduce in tests and expensive to debug in production. Is there a data consistency risk?
Can you tell if this code is working correctly in production? Are there structured logs? Can you trace a specific request through the system? If something goes wrong at 3am, will the on-call engineer be able to tell where and why? Observability gaps are easy to miss in review and costly to fix after the fact. Is it observable?
If this change behaves unexpectedly in production, how many users or systems are affected? A change to a core authentication flow has a different blast radius than a change to a notification preference. The blast radius determines the bar for review thoroughness and whether the change warrants a feature flag. What is the blast radius?

What you're looking at	Where it belongs
Correctness, edge cases	Human review, every PR
Boundary conditions	Human review, every PR
Error path quality	Human review, every PR
Data consistency	Human review, every PR
Observability	Human review, every PR
Code formatting, indentation	Automated formatter (Prettier, Black, gofmt)
Naming conventions	Linter rule or a written standard; not a review comment
Import ordering	Automated formatter
File location	Architecture discussion before the PR opens, not during review
"I would have done it differently"	Skip unless the approach has a correctness or performance consequence

What code review is for — and what it isn't

Five things to stop debating in reviews

These are the categories where review threads accumulate fastest and deliver least value. Each has a better answer than a comment thread.

Naming. If your linter doesn't catch it and your team doesn't have a written convention, don't debate it in a review. If a name is genuinely misleading (not just different from what you would have chosen), say so once, concisely. A twelve-message thread over getUserData versus fetchUserData is not an effective use of anyone's time. Document the convention and enforce it with a linter rule.

Formatting and line length. Your formatter should handle this. If it doesn't, configure it. This is not a reviewable item.

Where to put a file. File organisation decisions happen during planning or architecture review, not in a PR comment. If a file genuinely belongs elsewhere in a way that matters, that's worth one comment and a decision, not a thread.

Library versus custom implementation. This decision should happen before the code is written, in the issue or design discussion. If it wasn't, the comment belongs in the issue thread, not as a blocking note on code that's already written and tested.

Stylistic preferences with no behavioural consequence. "I would have written this differently" is almost never a useful review comment. "This will break when the input is an empty list because of the assumption on line 42" is.

PR size is the variable most teams ignore

Practical limits for reviews that are actually thorough:

Under 400 lines of diff for changes that will get real review
Under 200 lines for changes that touch authentication, payment, or security-sensitive paths
Refactoring PRs should be separate from feature PRs. Mixing them means reviewers can't distinguish a behaviour change from a rename.

If a feature genuinely can't be broken down smaller, the right answer is to ship it behind a feature flag and review the chunks incrementally. A PR that has to be reviewed in full before anything can ship is a process problem, not a code problem.

Async-first, synchronous when it earns it

The default for most teams is async: the author opens a PR, reviewers leave comments on their own time, the author addresses them, reviewers approve. This works well when the PR is small enough to read without excessive context-switching and the reviewer doesn't need to talk through the approach.

The case for a synchronous review (a 30-minute call where author and reviewer go through the diff together) is narrower but real:

The change is architecturally significant and the async thread has already generated five or more back-and-forth exchanges
The reviewer and author are at meaningfully different experience levels and the author will learn more from a live conversation
The team is blocked on this shipping and async is adding days to cycle time

Most teams should default to async and use synchronous selectively. Many teams do the opposite, which is why many reviews feel slow without being thorough.

What changes when AI writes the code

Code written by an AI coding assistant has patterns that human-written code doesn't tend to produce at the same rate:

Error paths that look reasonable and handle a subtly different case than the one that will actually occur in production
Confident over-engineering. The code solves a harder version of the problem than necessary, with abstractions that won't survive contact with the actual requirements.
Context blindness. The code doesn't know what happens before this function runs or after it returns, so functions that look correct in isolation can be wrong in context.

None of this changes the checklist. It changes the priority within it: boundary conditions and error paths deserve more scrutiny when the code was generated, because those are the categories where generation tends to fail most silently. The review is the same review; the emphasis shifts.

The review that earns its time

A code review process that respects everyone's time is not a shorter review. It's a review that spends its time on the things only a human can catch — and delegates everything else to tools. The six areas above have a common thread: they require understanding what the code is supposed to do, what could go wrong, and what the impact would be. A linter can't answer any of those questions. A reviewer can.

The teams that review code most effectively are usually the ones that have been deliberate about what the review is for, have automated everything that can be automated, and have norms around PR size that make the review doable in the first place.

Frequently asked questions

An AI agent deleted PocketOS's production database in 9 seconds. Credential scoping was the real failure.

A Cursor agent found one unscoped API token and wiped a production database and its backups in nine seconds. The real failure was credential scoping, not the model.

Jul 1, 2026Read full article →

Best PracticesMay 26, 20267 min readReviewed May 26, 2026

A code review checklist that respects everyone's time

What to check deeply, what to automate, and what to stop debating entirely

By FlowVerify Editorial Team

A review process that earns its time starts by being explicit about what code review is actually for.

The three jobs of a code review

Code review serves three distinct functions, and conflating them is the root cause of most slow, frustrating, or ineffective reviews.

The third function is where most review time gets lost. The fix is to automate it, not to do it better.

Automate the obvious before a human reads anything

A review checklist should come with a prerequisite: before a reviewer opens the diff, automated checks must already have passed.

A linter has run and found nothing (ESLint, Pylint, golangci-lint, Rubocop)
A formatter has run (Prettier, Black, gofmt, rustfmt)
Tests have passed, including integration tests where coverage exists
Static analysis has run (Semgrep for security patterns, type checking for typed languages)

Once the automated layer is in place, the human reviewer is freed to focus on the things that require actual judgment.

Six things that warrant genuine attention

These are the categories where code review consistently catches bugs and improves systems. They're also the categories that require a human.

Before reading a single line of code, read the PR description and the linked issue. Are they aligned? Sometimes the code is technically correct but solves a slightly different version of the problem than the one originally asked for. This is the most important check and the easiest to skip when you're under time pressure. Does it solve the right problem?
Every function has inputs it wasn't designed for. Every API endpoint can receive a request that the author didn't anticipate. For each public-facing boundary (a function signature, a database write, an API endpoint), ask: what happens if the input is null, empty, too large, malformed, or adversarial? The answer should be explicit, not accidental. What happens at the boundary?
Reviewers tend to read through the happy path and skim the catch blocks and early returns. Go back and read them deliberately. Does the error return useful information? Does it leave the system in a consistent state? Does it log enough context to debug later? Does it leak internal implementation details in a response body that ends up in a client? Is the error path as careful as the happy path?
Any change touching data (write operations, cache invalidation, queue publishing) should be read for consistency risks. Two writes that should be atomic but aren't. A cache that can diverge from the database. A queue message published before the database transaction commits. These bugs are difficult to reproduce in tests and expensive to debug in production. Is there a data consistency risk?
Can you tell if this code is working correctly in production? Are there structured logs? Can you trace a specific request through the system? If something goes wrong at 3am, will the on-call engineer be able to tell where and why? Observability gaps are easy to miss in review and costly to fix after the fact. Is it observable?
If this change behaves unexpectedly in production, how many users or systems are affected? A change to a core authentication flow has a different blast radius than a change to a notification preference. The blast radius determines the bar for review thoroughness and whether the change warrants a feature flag. What is the blast radius?

What you're looking at	Where it belongs
Correctness, edge cases	Human review, every PR
Boundary conditions	Human review, every PR
Error path quality	Human review, every PR
Data consistency	Human review, every PR
Observability	Human review, every PR
Code formatting, indentation	Automated formatter (Prettier, Black, gofmt)
Naming conventions	Linter rule or a written standard; not a review comment
Import ordering	Automated formatter
File location	Architecture discussion before the PR opens, not during review
"I would have done it differently"	Skip unless the approach has a correctness or performance consequence

What code review is for — and what it isn't

Five things to stop debating in reviews

These are the categories where review threads accumulate fastest and deliver least value. Each has a better answer than a comment thread.

Formatting and line length. Your formatter should handle this. If it doesn't, configure it. This is not a reviewable item.

PR size is the variable most teams ignore

Practical limits for reviews that are actually thorough:

Under 400 lines of diff for changes that will get real review
Under 200 lines for changes that touch authentication, payment, or security-sensitive paths
Refactoring PRs should be separate from feature PRs. Mixing them means reviewers can't distinguish a behaviour change from a rename.

Async-first, synchronous when it earns it

The case for a synchronous review (a 30-minute call where author and reviewer go through the diff together) is narrower but real:

The change is architecturally significant and the async thread has already generated five or more back-and-forth exchanges
The reviewer and author are at meaningfully different experience levels and the author will learn more from a live conversation
The team is blocked on this shipping and async is adding days to cycle time

Most teams should default to async and use synchronous selectively. Many teams do the opposite, which is why many reviews feel slow without being thorough.

What changes when AI writes the code

Code written by an AI coding assistant has patterns that human-written code doesn't tend to produce at the same rate:

Error paths that look reasonable and handle a subtly different case than the one that will actually occur in production
Confident over-engineering. The code solves a harder version of the problem than necessary, with abstractions that won't survive contact with the actual requirements.
Context blindness. The code doesn't know what happens before this function runs or after it returns, so functions that look correct in isolation can be wrong in context.

A code review checklist that respects everyone's time

The three jobs of a code review

Automate the obvious before a human reads anything

Six things that warrant genuine attention

Five things to stop debating in reviews

PR size is the variable most teams ignore

Async-first, synchronous when it earns it

What changes when AI writes the code

The review that earns its time

Frequently asked questions

Related reading

An AI agent deleted PocketOS's production database in 9 seconds. Credential scoping was the real failure.

Flaky tests aren't random. Six root causes explain almost all of them.

Three npm supply-chain attacks hit in four weeks. None of them needed a stolen password.

Stay ahead on eSignatures, compliance, and document workflows

An AI agent deleted PocketOS's production database in 9 seconds. Credential scoping was the real failure.

A code review checklist that respects everyone's time

The three jobs of a code review

Automate the obvious before a human reads anything

Six things that warrant genuine attention

Five things to stop debating in reviews

PR size is the variable most teams ignore

Async-first, synchronous when it earns it

What changes when AI writes the code

The review that earns its time

Frequently asked questions

Related reading

An AI agent deleted PocketOS's production database in 9 seconds. Credential scoping was the real failure.

Flaky tests aren't random. Six root causes explain almost all of them.

Three npm supply-chain attacks hit in four weeks. None of them needed a stolen password.

Stay ahead on eSignatures, compliance, and document workflows

An AI agent deleted PocketOS's production database in 9 seconds. Credential scoping was the real failure.