How to Configure Review Pass Count and Models
Pass configuration is the single most impactful tuning lever for AI code review quality. The choices you make here determine both the depth of analysis and the cost per review. Follow these steps to set up a configuration optimized for your team risk tolerance and budget.
Determine Your Baseline Pass Count
Start with two passes for all code changes. The first pass catches straightforward issues including style violations, obvious bugs, and known security patterns. The second pass adds depth by tracing data flows, checking error handling consistency, and examining edge cases. Run this two-pass configuration for two weeks and measure the findings. If the second pass consistently catches important issues that the first pass missed, two passes is your minimum. If the second pass rarely adds value, one pass may be sufficient for routine changes. Most teams settle on two passes as the default with three passes triggered for changes in high-risk directories like authentication, payment, and data access code.
Select Models for Each Pass
Assign a smaller, faster model to the first pass and a more capable model to the deep analysis passes. For the first pass, models like Claude Haiku or GPT-4o-mini provide adequate analysis for surface-level issues at 5 to 10 times lower cost than frontier models. For the second and third passes, use Claude Sonnet/Opus or GPT-4 for their stronger reasoning capabilities on complex logic analysis. This tiered approach typically reduces total review cost by 40 to 60 percent compared to using the best model for all passes while maintaining 90 to 95 percent of the detection quality. Track the unique findings from each model tier to verify that the cost savings do not compromise detection of important issues.
Configure Pass Focus Areas
Each pass should have a defined focus rather than repeating the same broad analysis. Configure the first pass to check for style consistency, naming conventions, obvious bugs (null dereferences, off-by-one errors), and known security vulnerability patterns. Configure the second pass to trace data flows across function boundaries, validate error handling chains, check resource lifecycle management, and examine concurrent access patterns. If using a third pass, focus it on cross-file interaction effects, architectural compliance, and security analysis with a specialized security prompt. Defining clear focus areas for each pass reduces token consumption because each pass loads only the context relevant to its analysis scope.
Set Convergence Criteria
Convergence criteria determine when to stop adding passes. The simplest criterion is a fixed pass count: always run exactly N passes regardless of findings. A more sophisticated approach is delta-based convergence: continue running passes until a pass produces no new findings beyond what previous passes detected. Delta-based convergence saves cost on simple changes that converge after one pass while allowing complex changes to receive additional passes. Set a maximum pass limit (typically four) to prevent runaway costs on pathological cases where findings never fully converge. Track the distribution of convergence points across your PRs to verify that your maximum is set appropriately.
Configure Risk-Based Review Depth
Not all code changes deserve the same review depth. Configure your pipeline to apply different pass counts based on the risk level of the changed files. Define high-risk directories (authentication, authorization, payment, data access, infrastructure configuration) that receive three-pass cross-model review. Define standard directories (business logic, API endpoints, service layers) that receive two-pass review. Define low-risk directories (UI components, documentation, tests, build configuration) that receive single-pass review. Most CI/CD platforms support path-based conditional logic that triggers different review workflows based on which files changed in the pull request.
Validate and Optimize Over Time
After configuring your passes, validate the setup by tracking three metrics: findings per pass (to verify each pass adds value), false positive rate per pass (to ensure deeper passes do not introduce noise), and cost per review (to verify the configuration stays within budget). Review these metrics monthly and adjust the configuration as needed. Common optimizations include adjusting the model selection when providers release new models with better price-performance ratios, adding new focus areas when the team adopts new frameworks, and updating risk classifications as the codebase evolves. The configuration should be a living document that improves with the team changing needs.
Two passes with tiered model selection is the optimal starting point for most teams. Use a fast model for the first pass and a capable model for deep analysis. Add risk-based depth so critical code gets three passes while routine changes get one. Measure findings per pass to validate that each pass adds genuine value.
Common Configuration Pitfalls
The most common mistake in pass configuration is running too many passes on every change regardless of complexity. Four or five passes on a simple documentation update wastes tokens and slows the review pipeline without catching additional issues. Risk-based depth (Step 5) prevents this waste by matching analysis intensity to change significance. Another frequent error is assigning the same broad analysis scope to every pass, which causes redundant findings and inflated false positive counts. Each pass should have a distinct focus area that builds on the findings from previous passes.
Teams also commonly underestimate the importance of convergence criteria. Without a clear stopping rule, some configurations continue running passes indefinitely on complex changes, consuming resources without producing actionable new findings. Setting both a delta threshold (stop when a pass produces fewer than N new findings) and a hard maximum pass count (never exceed M passes) prevents runaway analysis. Track your convergence statistics monthly to verify these thresholds remain appropriate as your codebase and review patterns evolve.