Claude Sonnet 5 Takes the Default Driver Slot — and Quietly Raises Your Token Bill
Plus it's baaaaaack! (Fable 5)
Verdict up front
Claude Sonnet 5 landed this week, June 30, 2026, as the new default on Claude Free and Pro and the new occupant of the middle tier: Haiku below it, Opus 4.8 above, Fable 5 and Mythos 5 at the top. The pitch from Anthropic is the one practitioners care about — Opus-adjacent capability at Sonnet economics, aimed squarely at high-volume agent loops where Opus pricing compounds fast. One wrinkle sharpens the switch decision. Fable 5 is back in Claude Code, redeployed the morning after Sonnet shipped, which puts the top of the lineup back in reach and moves the ceiling any switch has to weigh against.
Two things matter more than the headline. First, the architecture shift: extended thinking is gone, replaced by adaptive thinking that runs on by default and interleaves reasoning between tool calls. The effort parameter now defaults to high in both the Claude API and Claude Code. Second, the pricing has a catch the announcement does not foreground. The list price looks flat against Sonnet 4.6. The new tokenizer means your invoice may not be.
If you drive coding agents for a living, this is the model you will be pointing your harness at by default. The question is whether to switch today[1], reach past it, or wait two weeks and measure.
TL;DR
New default everywhere. Sonnet 5 is the default model on Free and Pro at launch. API ID
claude-sonnet-5, Bedrockanthropic.claude-sonnet-5, Vertex coming soon.Extended thinking removed; adaptive thinking is always on. Manual
thinking: {type: "enabled", budget_tokens: N}now returns a 400 error. Depth is set byeffort(low,medium,high,xhigh,max;highby default on the API and in Claude Code), and the model reasons between tool calls without prompt engineering — the architecturally significant change for agentic work.1M context, 128k output, January 2026 cutoff. Repo-level reasoning at Sonnet pricing.
Agentic coding 63.2%, per Anthropic’s comparison chart, versus 58.1% for Sonnet 4.6 and 69.2% for Opus 4.8. Likely SWE-bench Pro, not Verified — and no Verified score is published yet.
Pricing looks flat, isn’t quite. $2/$10 per MTok intro through August 31, then $3/$15 — same list price as Sonnet 4.6. But the new tokenizer generates 1.0–1.35x more tokens for the same text, so “same price” is misleading at the invoice level.
Fable 5 is back in Claude Code (July 1) as the practical ceiling above Opus — priced higher, with a retrained safety classifier that reroutes offensive-cyber work down to Opus 4.8. Details below.
What Anthropic shipped
Sonnet 5 sits in the workhorse slot of the lineup. Haiku 4.5 at $1/$5 handles cheap high-volume work. Opus 4.8 at $5/$25 is the heavy lifter. Fable 5 and Mythos 5 sit above Opus, both dark at Sonnet 5’s launch, though Fable returned to Claude Code on July 1 (more below). Sonnet is the tier you point at the bulk of your agent traffic, and the one whose economics decide whether a coding-agent product is viable.
The specs are current-generation and unsurprising on paper: 1,000,000-token context window, 128,000 max output tokens (300,000 on the Batch API with the beta header), text and image input, January 2026 knowledge cutoff. Anthropic calls it “the most agentic Sonnet model yet,” which is marketing, but the architecture underneath the phrase is the actual story.
Extended thinking is gone. If your code sends thinking: {type: "enabled", budget_tokens: N}, Sonnet 5 returns a 400 error. That is a breaking change for any harness that sets thinking budgets explicitly. Grep your codebase for it before you flip the model ID. In its place is adaptive thinking: always on, with depth allocated dynamically by the model and steered through the effort parameter — low, medium, high, xhigh, or max. The default is high on both the API and Claude Code, with xhigh available above it for the hardest coding and agentic tasks.
The piece that matters for agents is that adaptive thinking automatically enables interleaved thinking. The model reasons between tool calls. It reflects on what a tool returned before deciding the next action, and it does this without you wiring up a scratchpad or a reflection prompt. For anyone who has built an agent loop by hand, that is the part you used to engineer yourself, now folded into the default behavior of the model.
About that benchmark number
Anthropic’s launch chart gives an agentic coding comparison:
Model Agentic coding score Claude Sonnet 4.6 58.1% Claude Sonnet 5 63.2% Claude Opus 4.8 69.2%
Source: Anthropic’s comparison chart, via TechCrunch. Now the caveats, because this is where launch-day coverage tends to get sloppy.
That 63.2% is almost certainly SWE-bench Pro — the harder multi-file agentic eval — not SWE-bench Verified. The two are not interchangeable, and the numbers live on different scales. The Opus 4.8 figure in the same chart (69.2%) matches its published SWE-bench Pro score, which supports the Pro read. Anthropic has not published a standalone SWE-bench Verified number for Sonnet 5. The comparison chart is an image, with no underlying table released at launch.
So here is the trap. If you go looking, you will find a 82.1% SWE-bench figure attached to “Claude Sonnet 5.” Do not use it. That number comes from February 2026 pre-launch speculation about a different model iteration — a phantom Sonnet 5 with a “Dev Team Mode” that never shipped in the form described. The model that launched today has different characteristics. Any spec sheet dated before June 30 is describing something else.
What can you say with confidence? Sonnet 5 sits meaningfully above Sonnet 4.6 on agentic coding and a few points below Opus 4.8. For reference, the most recent third-party leaderboard before launch had Sonnet 4.6 at 79.6% SWE-bench Verified and Opus 4.8 at 88.6%. A reasonable expectation puts Sonnet 5’s Verified score somewhere in the low-to-mid 80s — but that is my read of where the gap lands, not a number Anthropic has confirmed. Treat it as judgment, not data.
On the safety side, Anthropic reports lower hallucination and sycophancy rates than Sonnet 4.6, better refusal of malicious requests, and stronger resistance to prompt injection. By design, it also carries intentionally weaker cybersecurity exploit capability than Opus 4.8.
Pricing and the tokenizer tax
The list price is the easy part:
Model Input $/MTok Output $/MTok Claude Haiku 4.5 $1.00 $5.00 Sonnet 5 (intro, through Aug 31) $2.00 $10.00 Claude Sonnet 4.6 $3.00 $15.00 Sonnet 5 (standard, from Sep 1) $3.00 $15.00 Claude Opus 4.8 $5.00 $25.00
At standard rates, Sonnet 5 carries the identical list price to Sonnet 4.6: $3/$15. Through August 31 you get an introductory $2/$10. Read quickly, that says “same price, free discount for two months.” Read the footnote and it says something else.
Sonnet 5 uses the same tokenizer as Opus 4.7, which encodes the same text into roughly 1.0–1.35x more tokens than pre-Opus-4.7 models, depending on content type. List price is per token. So at the standard September rate, identical workloads can cost up to 35% more than they did on Sonnet 4.6 — same sticker, more tokens on the meter. The introductory pricing is doing real work here: the 33% input discount roughly offsets the tokenizer inflation through the end of August, which is almost certainly the point. Come September 1, the offset disappears and the effective increase shows up on the invoice.
The practical move is dull and necessary. Run a representative slice of your actual workload through Sonnet 5 and measure token consumption directly. Do not assume cost neutrality from the matching list price. The teams that got burned on the 4.6-to-4.7 transition were the ones who read the sticker and skipped the meter.
Where it sits in the market
The framing that fits the launch is that the workhorse-tier fight has moved off “which model is smartest” and onto “how cheaply and reliably does it run without a human watching.” That is the right frame for anyone shipping coding agents, where margin lives or dies on the per-task cost of the driver model.
Anthropic positions Sonnet 5 as cheaper than OpenAI’s GPT-5.5 and Google’s Gemini 3.1 Pro at introductory rates, and more expensive than Gemini 3.5 Flash, which holds the budget slot. I’d treat the competitor pricing as directional rather than precise — those comparisons come from launch coverage, not from cross-checked primary pricing pages, and competitor list prices move. The shape is more reliable than the digits: Sonnet 5 is priced to undercut the premium workhorses and to sit above the bargain tier, which is exactly where Anthropic wants the default coding driver to land.
The use case Anthropic names is the one this audience will actually hit: high-volume agent loops where Opus pricing compounds. CI review bots. Test generation. Batch transforms. Multi-step autonomous workflows that run for a while without supervision. Daniel Shepard at Zapier told TechCrunch that a two-part automation which “used to stall halfway” now finishes end to end. A single data point, and a vendor-friendly one, but it points at the right capability: completion of long autonomous chains, not raw single-shot smarts.
What this means for agentic coding
The interesting architecture is the harness story. Default-high effort plus interleaved reasoning means the model arrives already configured for the agent pattern most teams hand-build: think, act, observe the result, reflect, act again. You do not prompt-engineer your way to a reflective agent anymore. It is the out-of-box behavior. For harness authors, that shifts the work — less coaxing the model into reasoning between steps, more managing the effort dial so a 12-file rename does not quietly run at high and burn tokens it never needed.
Watch that dial. effort defaults to high, and high effort spends more tokens than the job often warrants. The same lesson from the Opus 4.8 cycle applies: the cost surprises come from leaving the reasoning budget maxed on work that did not need it. Drop to medium or low for mechanical tasks. Reserve high for the hard ones.
And the 1M context window at Sonnet pricing is the underrated piece. Repo-level reasoning — cross-file refactors, full-codebase audits — without chunking, on the model you were already going to run for volume.
The ceiling moved: Fable 5 is back
Then there is the ceiling, and in the days around this launch it moved. The most capable public Claude model most teams could actually run was Opus 4.8, at roughly 88.6% SWE-bench Verified on the Morph leaderboard. Fable 5 sits higher on that same board, in the mid-90s, though that figure is a third-party read and Anthropic has published no Verified score of its own. What moved here was less capability than access: at Sonnet 5’s June 30 launch, Fable was dark. Anthropic had suspended Fable 5 and Mythos 5 on June 12 to comply with a U.S. export-control order, and with no way to verify user nationality in real time, it pulled both models for all users.
That reversed a day later. On June 30 the Commerce Department lifted the order, and on July 1 Anthropic redeployed Fable 5 globally across the Claude Platform, Claude.ai, Claude Code, and Cowork. As I write this, my Claude Code /model picker offers Fable 5, and I’ve set it as my default for new sessions. So the practical ceiling is no longer Opus. It is Fable — with two asterisks.
The first is access and price. Through July 7, Fable 5 counts against up to 50% of weekly usage limits on Pro, Max, Team, and select Enterprise plans; after that it shifts to usage credits, and cloud-provider access on AWS, Google Cloud, and Microsoft Foundry is still being re-enabled in phases. Per token it prices well above the Opus tier: Anthropic’s model docs list it at $10/$50 per MTok, double Opus 4.8’s $5/$25, and the new tokenizer encodes the same text into more tokens again, so the effective gap is wider than the sticker. This is not the model you point at bulk agent traffic. It is the one you reach for on the problems that stall everything below it.
The second asterisk is the reason it came back at all, and it lands squarely in coding work. The suspension traced to a report from Amazon researchers who found a prompt that got Fable 5 to identify software vulnerabilities and start describing how one could be exploited, before its guardrails blocked the attempt from reaching a working exploit. Anthropic’s fix was not to weaken the model but to retrain the safety classifier sitting in front of it, and the new one blocks that specific technique in more than 99% of cases. When the classifier fires, the request doesn’t error. It reroutes to Opus 4.8 in the same session, re-run and labeled with the model that actually answered. The documented trigger categories are offensive cybersecurity work (building exploits, malware, or attack tooling), plus most biology and chemistry, distillation attacks on Fable itself, and frontier-model development.
Here is where it gets concrete. Ask Fable 5 to write exploit code against a security vulnerability and you can watch it hand the task down to Opus 4.8 mid-session, a quieter, older model finishing what the newer one declined. For defensive security work that reversion is a real cost, not a hypothetical. Anthropic is candid that the tighter cyber filter routes more benign coding and debugging requests to Opus than teams would like, and developers have already reported the classifier over-firing on routine defensive-security work, auto-switching to Opus on tasks like CVE triage. The billing follows the block: a request stopped on input is charged at Opus rates; one stopped midstream bills Fable rates for the tokens already produced, then Opus for the rest.
Should you switch your default driver?
For most coding-agent work, yes — but measure first, and mind the calendar.
Switch now if you are running Opus 4.8 on tasks that do not strictly need it. The capability floor rose; a real share of Opus traffic will run acceptably on Sonnet 5 at a meaningful discount, and the intro pricing through August 31 makes the test cheap. This is the clearest win in the release.
Switch now, with care if you are upgrading from Sonnet 4.6. You get better agentic coding, interleaved reasoning by default, lower hallucination and sycophancy. But grep for explicit thinking budgets first — they will 400 — and run your token measurement before September 1, when the tokenizer tax stops being masked by the intro discount.
Wait, or stay put if you are on Priority Tier with Sonnet 4.6 — it is not available on Sonnet 5 at launch, so the choice is staying on 4.6 or jumping to Opus 4.8. And if your cost model is tight and unmeasured, the matching list price is a trap; spend the two weeks measuring before you commit budget to it.
There is now a fourth move the launch-day framing didn’t include: reach past Sonnet entirely. With Fable 5 back in Claude Code as of July 1, the hardest problems that Sonnet 5 and even Opus 4.8 stall on have a home again, at a price that keeps it off your bulk traffic and with a security governor that bounces exploit work down to Opus. For volume, Sonnet 5 is the driver. For the few tasks that justify the top of the lineup, the top of the lineup is reachable again.
Anthropic built Sonnet 5 to be the default driver model for the agent era, and on the architecture, it earns the slot. Adaptive thinking with interleaved reasoning is the correct shape for a coding agent, and shipping it as the default behavior removes work that harness authors used to do by hand. The asterisk is the invoice. The list price tells you one thing; the tokenizer tells you another. For the next two months, the introductory rate hides the difference. After that, the teams that measured will know what they are paying for, and the teams that read the sticker will get a surprise in their September bill.
Bob Matsuoka is CTO of Duetto and also writes about AI business at AI Power Ranking.
Related reading:
I Tracked Every Token — What a $1.07 bug fix reveals about AI coding economics, and why per-token pricing hides the real bill.
The Agent Unlock: Why Opus 4.5 Changed How I Work — When a top-tier model crossed the line into autonomous coding that holds up under real work.
Breaking: Opus 4.6 and Agent Teams — A model release read for what it changes in day-to-day agent workflows, not just the benchmark chart.
AI Power Ranking — Tool comparisons and benchmarks for AI practitioners
LinkedIn Newsletter — Strategic AI insights for CTOs and engineering leaders
[1]: Complicating “today”: with Fable 5 back as a Claude Code option (and, for some of us, the new default for hard sessions), the switch question isn’t only Sonnet-5-or-not. It’s which tier each task belongs to, Fable included. See “The ceiling moved: Fable 5 is back” below.



