Hi folks!
Just pushed v3.2.9 and wanted to share what's new. Two big things: added support for Claude 4 and OpenAI's new o3 reasoning models, plus fixed some embarrassing bugs where the tool was showing the wrong AI provider (that's been bugging me for weeks).
New Models
Claude 4 Models
Anthropic released Claude 4 and I've added both models:
Claude 4 Opus (anthropic:claude-4-opus
) - $15 input / $75 output per 1M tokens, 200K context. Their flagship model, really good at complex reasoning. Perfect for architectural reviews when you need the heavy lifting.
Claude 4 Sonnet (anthropic:claude-4-sonnet
) - $3 input / $15 output per 1M tokens, 200K context. This is probably what you want for daily code reviews. Way cheaper but still really capable.
OpenAI o3 Models
OpenAI's new reasoning models are pretty interesting for code analysis:
o3-mini (openai:o3-mini
) - OpenAI's latest reasoning model with enhanced reasoning abilities. Designed specifically for complex problem-solving and code analysis. Better at step-by-step thinking through problems.
o3 (openai:o3
) - The full reasoning model. More expensive but really good at complex analysis that requires multiple steps of reasoning. Great for architectural reviews and tricky debugging.
These are different from the regular GPT models - they're specifically designed to "think through" problems step by step, which makes them really good for code reviews where you need logical analysis.
Quick Setup
# Claude 4 (I'd start with Sonnet)
export AI_CODE_REVIEW_MODEL=anthropic:claude-4-sonnet
ai-code-review src/
# o3 reasoning models
export AI_CODE_REVIEW_MODEL=openai:o3-mini
ai-code-review src/
# Always check costs first
ai-code-review src/ --estimate
What I Fixed
Provider display: Tool was showing "Gemini" when you were using Claude (oops). Now it shows the right provider.
Cost calculations: Were using wrong model identifiers internally. The estimates should be accurate now.
Token display: What you saw didn't match what was used for cost calculation. That's fixed.
Model identifiers: Had wrong API names initially. They match what the providers expect now.
Cost Reality Check
For a typical 200-file codebase (~350K tokens), here's what you're looking at:
Single pass:
Claude 4 Sonnet: ~$5 (reasonable)
o3-mini: ~$8-12 (reasoning models cost more but think better)
Claude 4 Opus: ~$26 (expensive but really good)
o3: ~$15-25 (varies, reasoning is expensive)
More importantly, here's how I'd think about choosing:
Model Cost Best For Claude 4 Sonnet Low Daily reviews, general analysis o3-mini Medium Code logic issues, debugging Claude 4 Opus High Complex architectural stuff o3 High Really complex reasoning problems
My Take
The o3 models are interesting because they actually "reason" through problems step by step. They're slower and more expensive, but they're really good at catching logical issues and thinking through complex code flows.
I'd probably use Claude 4 Sonnet for most daily reviews, and save the o3 models for when you have really tricky logic or architectural problems that need deep analysis.
Second Opinion Feature: This is actually huge. If you're already using Claude Code (which uses Claude models), you can run this tool with Gemini or o3 models to get a completely different AI perspective on the same code. Different models catch different things - Claude might spot architectural issues while o3 catches logical problems Claude missed. It's like having multiple expert reviewers.
# Get a second opinion after using Claude Code
ai-code-review src/ --model openai:o3-mini # Different reasoning approach
ai-code-review src/ --model gemini:gemini-1.5-pro # Google's perspective
I think this multi-model approach is really powerful. Claude 4 Sonnet is still my go-to for general reviews, but getting different AI perspectives on the same code? That's pretty valuable.
Installation
npm install -g @bobmatnyc/ai-code-review@latest
ai-code-review --show-version
# Test with a reasoning model
ai-code-review src/ --model openai:o3-mini --estimate
Migration
If you're using older models:
# From Claude 3 (same price, newer model)
export AI_CODE_REVIEW_MODEL=anthropic:claude-4-sonnet
# From GPT-4o (try reasoning)
export AI_CODE_REVIEW_MODEL=openai:o3-mini
# For complex analysis
export AI_CODE_REVIEW_MODEL=openai:o3
Bottom Line
This is a solid update. Claude 4 gives you the latest from Anthropic at the same prices, and the o3 models are really interesting for complex reasoning tasks. The bug fixes should make everything way less confusing about what model you're actually using.
I'd recommend Claude 4 Sonnet for daily use, and try the o3 models when you have complex logic issues or architectural problems that need deep thinking.
LMK if you run into any issues with the new models.
Cheers! --Bob