ai-code-review v3.0.1

Building Better Code Reviews

May 15, 2025

I've been using my ai-code-review tool for weeks now, and with the release of v3.0.1, I finally feel like it's hitting the mark I originally envisioned. This isn't just another point release—it's the culmination of real-world usage feedback and some hard-won insights about what actually makes AI code reviews useful.

Fixing the Big Problem: No More Dropped Reviews

Let's start with what was broken in previous versions. When you tried to review large codebases, the tool would silently drop content that exceeded token limits. You'd get partial reviews without knowing what was missing—a terrible user experience that made the tool unreliable for real-world projects.

v3.0.1 fixes this completely with intelligent token-based chunking. Nothing gets dropped. Everything gets reviewed.

The Real Game-Changer: Smart Token-Based Chunking

The major feature in v3.0.1 is intelligent chunking for large codebases. Now the tool automatically breaks down large submissions into manageable chunks without losing context.

Here's how it actually works:

Token Analysis Phase: The tool calculates token usage for all files upfront, adds prompt overhead (1500 tokens for system instructions), and compares against the model's context window.

Smart Chunking Logic: If the total exceeds the context window, files get sorted by size (largest first) and grouped into chunks that fit within an "effective context size"—which is the full context window minus a 15% buffer.

That 15% buffer is crucial. It reserves space for maintaining context between passes—summarizing previous findings, tracking patterns across chunks. Without it, each chunk would exist in isolation.

Multi-Pass Execution: Each chunk gets reviewed while being aware of previous findings. The tool maintains a ReviewContext object that carries insights forward across all passes.

The Final Stitch: Here's the key part—after all chunks are reviewed, the tool passes all the individual reviews to the AI one more time for consolidation. The AI synthesizes findings across chunks, identifies patterns that span multiple files, and generates a final unified grade for the entire codebase.

This final consolidation step is what makes chunked reviews coherent. The AI can spot architectural issues that only become apparent when looking at the full picture, not just individual chunks.

Cost Estimates That Actually Matter

The multi-pass confirmation dialog now provides accurate cost estimates before you commit to a review. It shows:

Token analysis results
Whether chunking is needed (and why)
Number of passes required (including final consolidation)
Projected costs across selected models

I added this after accidentally burning through $50 in API credits on a single large repository review. The confirmation step isn't just a speed bump—it's financial sanity.

You can skip it with --no-confirm for automated workflows. But for interactive use, that pause often saves both money and focus.

What I Actually Use This For

Three scenarios where ai-code-review has become indispensable:

Legacy Code Triage: When diving into unfamiliar codebases, the architectural review mode gives me a rapid overview of design patterns, potential debt, and areas of complexity. The chunking ensures I get comprehensive coverage without losing the forest for the trees.

PR Review Supplements: Not a replacement for human review, but excellent for catching obvious issues before they hit the team. Security patterns, edge case handling, test coverage gaps. The multi-pass approach with final consolidation means patterns get detected across file boundaries and properly weighted in the final assessment.

Learning from AI Models: Different models catch different things. Claude tends to be thorough on architecture. GPT-4 is sharp on edge cases. Gemini often spots optimization opportunities. Running all three gives you triangulated feedback across the entire codebase.

The Stealth Feature: Codebase Grading

Here's something that didn't make the release notes: the tool now assigns letter grades (A-F) to your codebase for each review category. It's not just "here's some feedback"—it's "your test coverage gets a C+, your documentation gets a B-, your security practices get an A-."

The grading system weights deviations by significance. A missing comment doesn't tank your score. A SQL injection vulnerability does. The AI understands the difference between stylistic preferences and genuine issues that could cause production problems.

The final consolidation step is especially important for accurate grading. The AI considers all findings across chunks before assigning grades, ensuring that localized issues don't unfairly impact the overall score, while systemic problems get properly recognized.

The Technical Evolution

Building this tool taught me something about AI model APIs: they're surprisingly inconsistent. Not just in output quality—in basic formatting expectations, error handling, and version management.

The chunking algorithm went through several iterations:

First attempt: naive file-size splitting (terrible for context)
Second: directory-based grouping (missed cross-cutting concerns)
Current: token-aware chunking with context maintenance and final consolidation

The breakthrough was realizing that token counting had to happen upfront, before any chunking decisions. Most tools chunk first, then discover they've blown the context window. We analyze token usage, determine if chunking is needed, then create an optimal plan.

The TokenAnalyzer class handles the heavy lifting, while MultiPassReviewStrategy orchestrates the actual execution. The 15% context maintenance factor came from empirical testing—less and you lose coherence, more and you waste token budget.

Limitations Worth Noting

Let's be honest about what this tool doesn't do:

Very Large Single Files: If an individual file exceeds the context window, it still poses challenges. The tool handles this better than before, but it's not perfect.

Domain Knowledge: Generic code review. It doesn't know your business rules, team conventions, or product requirements. An A+ in generic best practices might be a C- for your specific use case.

Cost at Scale: Even with better estimates, running comprehensive reviews on large codebases isn't cheap. The final consolidation pass adds cost but significantly improves quality.

Perfect Context Preservation: While much better than naive approaches, some nuanced relationships between distant files might still be missed.

Should You Use It?

If you're working alone or on small teams where structured code review isn't happening consistently, absolutely. The chunking makes it practical to review entire projects without dropping content, and the grades give you immediate feedback on where to focus.

If you're already doing thorough human reviews, treat it as a supplement, not a replacement. Use it to catch the obvious stuff so humans can focus on the subtle architectural and business logic questions.

For teams just starting to implement code review practices, it's excellent training wheels. The structured feedback and grades help developers internalize what to look for in their own and others' code.

Bottom Line

v3.0.1 makes AI-assisted code review reliable for real-world projects. The token-based chunking handles large codebases without dropping content. The final consolidation step ensures coherent, well-graded reviews. The cost estimates prevent budget surprises.

And those letter grades? They're not in the release notes, but they're remarkably good at telling you where to focus your improvement efforts. Sometimes that's exactly what you need.

Get it with npm install -g @bobmatnyc/ai-code-review (or just run with a npx @bobmatnyc/ai-code-review .) and see what your codebase scores—all of it, not just the parts that fit in a context window.

Full release notes and documentation available at github.com/bobmatnyc/ai-code-review