Claude Sonnet 5 Review: The Most Agentic Sonnet Yet, at a Cutdown Price

Q: Is Claude Sonnet 5 better than Opus 4.8?

Not overall. Opus 4.8 leads on agentic coding at 69.2% versus Sonnet 5's 63.2%, but Sonnet 5 slightly surpasses Opus 4.8 on some knowledge-work benchmarks and costs far less to run.

Q: How much does Claude Sonnet 5 cost?

Introductory API pricing is $2 per million input tokens and $10 per million output tokens through August 31, 2026, then $3/$15. It is also free as the default model on Claude's Free and Pro plans.

Q: Is Sonnet 5 good for coding?

Yes. It is the strongest Sonnet for building code, writes tests first, and finishes long autonomous tasks. Its main downside is that it is slower and more verbose on very small edits.

Q: Should I switch from Sonnet 4.6 to Sonnet 5?

For most teams doing real building work, yes. Sonnet 4.6 catches slightly more bugs in pure review, but Sonnet 5 wins clearly on writing code and agent reliability.

Q: Is Claude Sonnet 5 safe to use?

It improves on Sonnet 4.6 in refusals, prompt-injection resistance, and lower hallucination and sycophancy. Both new Sonnet models scored 0.0% on the Firefox 147 exploit benchmark developed with Mozilla.

Anthropic released Claude Sonnet 5 on June 30, 2026 as the successor to Sonnet 4.6, and it is the most agentic Sonnet model the company has shipped. According to Anthropic’s own launch announcement, the new Sonnet narrows the gap with the flagship Opus 4.8 on reasoning, tool use, and coding while costing a fraction as much to run.

The short verdict from a week of hands-on use: Sonnet 5 is a clear upgrade for anyone building software or running agents, and a more nuanced trade-off if pure code review is all you care about. This review breaks down what changed, how it benchmarks, what it costs, and who should switch.

What’s New in Claude Sonnet 5

Sonnet 5 is built to plan, reason, and act on its own. It can use tools like browsers and terminals, coordinate subagents, and run autonomously at a level that, only a few months ago, required larger and more expensive models. On launch day it became the default model across Claude’s Free and Pro plans, and it is also available to Max, Team, and Enterprise users.

The most agentic Sonnet to date

For developers, the new Sonnet is available in Claude Code and on the Claude Platform, where the model id is claude-sonnet-5. What sets it apart from earlier Sonnet-class models is stamina: it finishes complex, multi-step jobs where previous versions would stop short. Anthropic’s early-access testers described the same pattern repeatedly — the model works a problem all the way through instead of handing back the first answer that happens to compile.

That behavior makes Sonnet 5 a strong fit for agent loops, where you hand the model a goal and let it try approaches, test them, and improve the result before reporting back. It behaves less like an autocomplete and more like a careful mid-level engineer who would rather take an extra few minutes than ship something that breaks later.

Self-correction and the effort dial

The headline reliability feature is self-correction. Sonnet 5 reviews its own output without being asked and fixes errors before you ever see them. On long agent jobs it can even rewrite its own plan mid-task, so runs wander off course far less often than they did with Sonnet 4.6.

Sonnet 5 also introduces a thinking effort dial. You can turn effort up for a tricky problem where a missed detail is expensive, or down — even off — for routine work where deep reasoning is not worth the tokens. That single control is what keeps the model’s extra thoroughness from quietly inflating your bill.

Coding and Agentic Performance

Code is where Sonnet 5 earns its upgrade. It approaches Opus 4.8-level behavior on real building work, and it does so at mid-tier prices. The catch is that its careful style cuts both ways depending on the task.

How it writes code

Sonnet 5 treats testing as a habit. It tends to write tests first, builds the feature on top of them, and runs everything once it believes the job is done — which is exactly why it catches clashes between code and tests that other models miss. Left to run unattended on an open-ended task, it will keep polishing a working solution pass after pass, chasing the best answer rather than the first one.

The flip side is verbosity. Ask for a one-line change and you may get extra helper functions and a test file longer than the feature itself. Sonnet 5 is also slower than Sonnet 4.6 and uses more tokens, a direct consequence of the extra thinking. That trade of minutes for thoroughness pays off on long jobs you leave running, and stings most on tiny edits.

How it reviews code

For pure code review the picture is a genuine trade-off. Independent hands-on testing found Sonnet 5’s comments are cleaner and more often real bugs than noise, with review precision rising from roughly 29% on Sonnet 4.6 to about 38–40%. But on the strict “did it find the bug” measure it caught fewer bugs — around 50–51% versus Sonnet 4.6’s noisier but higher 63%. Turning effort to maximum barely moved the score while roughly doubling the cost.

The takeaway is not that Sonnet 5 is a weak reviewer; it is a quieter, more careful one. For teams drowning in review noise, fewer and sharper comments are often the better trade.

Benchmarks: Sonnet 5 vs Opus 4.8 vs Sonnet 4.6

Anthropic positions Sonnet 5 as a substantial jump over its predecessor and a near-peer to the flagship on many tasks. The single clearest number is agentic coding, where Sonnet 5 lands between the two.

Model	Agentic coding	API price (per 1M in / out)
Opus 4.8 (flagship)	69.2%	$5 / $25
Claude Sonnet 5	63.2%	$2 / $10 intro → $3 / $15
Sonnet 4.6 (previous)	58.1%	superseded

Sonnet 5 does not beat Opus 4.8 on agentic coding, but it closes most of the gap over Sonnet 4.6 — a five-point jump. On some knowledge-work benchmarks Sonnet 5 actually slightly surpasses Opus 4.8, which is why Anthropic frames the two as a spectrum rather than a strict hierarchy.

Opus 4.8 is still the model of choice for higher accuracy on these tasks, but Sonnet 5 provides developers with lower-priced options that are of much higher quality than what was previously available.
Anthropic

Pricing and Value

Pricing is the reason Sonnet 5 is getting so much attention. Anthropic launched it with introductory API pricing of $2 per million input tokens and $10 per million output tokens through August 31, 2026, after which it moves to standard pricing of $3 per million input and $15 per million output. The full rate card is published on the Claude Platform pricing page.

To put that in context, Opus 4.8 costs $5 per million input and $25 per million output — more than double the standard Sonnet 5 rate. That gap is what makes Sonnet 5 a direct, lower-cost alternative to Opus for everyday agentic tasks, as well as to OpenAI’s GPT-5.6 Sol and Google’s Gemini 3.5 Flash. It is cheaper than Opus 4.8, GPT-5.5, and Gemini 3.1 Pro, though still pricier per token than Gemini 3.5 Flash.

To decide whether the value works for your workload, a quick test run beats guessing:

Point Sonnet 5 at a representative task at medium effort and note wall-clock time and token count.
Repeat the same task on your current model for a like-for-like baseline.
Compare cost per completed task, not cost per token — Sonnet 5’s thoroughness changes the token math.
Retry the hardest tasks at high effort only, and check whether the extra spend buys meaningfully better output.
Settle on the lowest effort level that still clears your quality bar.

Safety and Reliability

Sonnet 5 is an improvement on Sonnet 4.6 across Anthropic’s pre-deployment safety evaluations. On agentic safety it is better at refusing malicious requests and resisting hijack attempts in prompt-injection attacks, and it shows lower rates of hallucination and sycophancy than its predecessor.

Agentic safety improvements

On cybersecurity, both new Sonnet models scored 0.0% on an exploit benchmark built around Firefox 147 — an evaluation Anthropic developed in collaboration with Mozilla, with all the tested vulnerabilities patched in Firefox 148 (see Mozilla’s security advisories). Neither Sonnet model could develop a working exploit, and Sonnet 5 ships with real-time cyber safeguards enabled by default. That combination is what makes the model comfortable to deploy in autonomous, tool-using workflows where a careless refusal or a leaked exploit would be costly.

Verdict: Should You Switch?

For most teams doing real building work, the answer is yes. Sonnet 5 writes and ships code like a careful teammate, tests its own work, and sticks with a hard problem until it is solved — a clear step up from Sonnet 4.6 for anyone shipping software or running agents. Run it at medium effort and you get most of the upside without paying flagship rates.

Two groups should think twice. High-volume teams with tight latency budgets and lots of tiny diffs may find its slower, more thorough style does not earn its keep yet. And if pure bug-catching in review is your single priority, Sonnet 4.6 still edges it out on raw recall. Everyone else — especially anyone paying flagship prices only because nothing cheaper was good enough — should line Sonnet 5 up against their current model before the next renewal.

FAQ

Is Claude Sonnet 5 better than Opus 4.8?

How much does Claude Sonnet 5 cost?

Is Sonnet 5 good for coding?

Should I switch from Sonnet 4.6 to Sonnet 5?

Is Claude Sonnet 5 safe to use?