Claude Sonnet 5 vs Opus 4.8: Benchmarks, Pricing, and Which to Choose
On June 30, 2026, Anthropic released Claude Sonnet 5, its most agentic Sonnet model yet, and it lands within a few points of the flagship Opus 4.8 while costing roughly 40% less per token. The short answer to “sonnet 5 vs opus 4.8”: Opus 4.8 still wins the hardest coding and reasoning tasks, but Sonnet 5 ties it on knowledge work and wins decisively on price, according to Anthropic’s launch announcement.
This guide compares the two models on benchmarks, pricing, specs, and safety, then gives a clear rule for when to run each one.

Release context: where each model sits
Sonnet 5 is Anthropic’s most agentic Sonnet to date and the direct successor to Sonnet 4.6. From launch day it became the default model on the Free and Pro plans, and it is also available to Max, Team, and Enterprise users, in Claude Code, and through the Claude Platform. Opus 4.8 is Anthropic’s flagship: the strongest, highest-accuracy Claude in general availability, starting at the Pro tier.
Rather than drawing a hard line between the two, Anthropic frames them as an effort dial. You reach for claude-sonnet-5 when speed and cost matter, and dial up to claude-opus-4-8 when a task justifies the premium. Both are current members of the Claude 5 generation, with Opus 4.8 sitting a notch above Sonnet 5 on raw capability.
Benchmark head-to-head
Across Anthropic’s published comparison, Opus 4.8 holds the capability lead on coding, terminal use, computer use, and hard reasoning, while the two models effectively tie on knowledge work. The margins matter as much as the direction: most gaps are only a few points wide.
| Benchmark | Claude Sonnet 5 | Claude Opus 4.8 | Claude Sonnet 4.6 |
|---|---|---|---|
| Agentic coding (SWE-bench Pro) | 63.2% | 69.2% | 58.1% |
| Terminal-Bench 2.1 | 80.4% | 82.7% | — |
| OSWorld-Verified (computer use) | 81.2% | 83.4% | — |
| Humanity’s Last Exam (no tools) | 43.2% | 49.8% | — |
| Humanity’s Last Exam (with tools) | 57.4% | 57.9% | — |
| Knowledge work (GDPval-AA v2, Elo) | ~1,618 | ~1,615 | — |
Coding and reasoning: Opus 4.8 keeps the edge
On agentic coding, measured with SWE-bench Pro, Sonnet 5 scores 63.2% against Opus 4.8’s 69.2% — a six-point gap that tracks with Opus’s advantage on longer, messier engineering work. The improvement over the previous Sonnet is real, though: Sonnet 4.6 managed only 58.1% on the same test. On complex command-line workflows (Terminal-Bench 2.1) and computer use (OSWorld-Verified), Opus stays ahead by roughly two points, and on Humanity’s Last Exam without tools the gap is 6.6 points.
The pattern is consistent: as task difficulty and horizon increase, Opus 4.8 pulls further ahead. For correctness-critical work where a small per-step error compounds across a long session, that edge is exactly what you pay for.
Knowledge work: effectively a tie
The picture flips on knowledge work. On the GDPval-AA v2 professional-task benchmark, Sonnet 5 posts an Elo of about 1,618 to Opus 4.8’s roughly 1,615 — a statistical tie, and one where Sonnet 5 slightly surpasses the flagship. Anthropic also highlights that Sonnet 5 is self-correcting: it reviews its own output and fixes errors before you see them, finishing complex multi-step tasks where earlier Sonnet models would stop short.
That combination — parity on real professional tasks plus autonomous self-checking — is what makes Sonnet 5 the sensible default for everyday analysis, drafting, and agentic automation.

Pricing: the real reason to pick Sonnet 5
Price is the structural difference between these two models. Anthropic set Sonnet 5 well below Opus 4.8 at every effort level, and layered an introductory discount on top for the migration window.
| Tier | Claude Sonnet 5 | Claude Opus 4.8 |
|---|---|---|
| Standard input (per 1M) | $3 | $5 |
| Standard output (per 1M) | $15 | $25 |
| Introductory (through Aug 31, 2026) | $2 / $10 | not offered |
Standard and introductory rates
Sonnet 5 launched with introductory pricing of $2 per million input tokens and $10 per million output tokens through August 31, 2026, after which it moves to standard $3 / $15. Opus 4.8 sits at $5 / $25. You can confirm the current rates on the Claude Platform pricing documentation. One caveat keeps the gap from being quite as wide as the sticker price: Sonnet 5 uses an updated tokenizer that maps the same text to roughly 1.0–1.35x more tokens, so Anthropic set the intro price to make the switch from Sonnet 4.6 roughly cost-neutral.
Anthropic is explicit about how it wants developers to think about the trade-off:
Opus 4.8 is still the model of choice for higher accuracy on these tasks, but Sonnet 5 provides developers with lower-priced options that are of much higher quality than what was previously available.
Anthropic
Across a high-volume agentic pipeline making thousands of calls, the difference between $3/$15 and $5/$25 is the difference between a hobby budget and a production line item.
Specs and availability
Where it counts, the two models are nearly identical. Both carry a 1M-token context window, cap output at 128K tokens, and share a January 2026 knowledge cutoff. Both run in Claude Code and on the Claude Platform under the model IDs claude-sonnet-5 and claude-opus-4-8.
The meaningful spec difference is availability rather than raw capacity:
- Free tier: Sonnet 5 is the default model on the Free plan; Opus 4.8 starts at Pro.
- Default across consumer plans: Sonnet 5 is the out-of-the-box model for Free and Pro users.
- Higher tiers: Both are available to Max, Team, and Enterprise.
- Developer access: Both are callable via the Claude API and inside Claude Code.
If you want frontier-adjacent agentic capability without a subscription, Sonnet 5 is the only one of the two you can run for free.

Safety and prompt-injection
Anthropic reports that Sonnet 5 is safer overall than Sonnet 4.6: it refuses malicious requests more reliably, resists prompt-injection hijack attempts better, and shows lower rates of hallucination and sycophancy. It still trails the more capable Opus 4.8 on the automated behavioral audit, so for the most safety-sensitive workloads Opus keeps an edge.
Refusals, prompt injection, and cyber
On the cyber front, Anthropic did not deliberately train Sonnet 5 for cybersecurity, and its exploit-development ability is far behind Opus. In an evaluation built with Mozilla that tested whether models could develop exploits for Firefox 147, both new Sonnet models scored 0.0% — neither could produce a working exploit — and every vulnerability was patched in Firefox 148. Sonnet 5 still ships with cyber safeguards enabled by default.
For heavy or reduced-guardrail cybersecurity work, Anthropic continues to recommend Opus 4.8. For everyday agentic tasks, the improved refusal behavior and prompt-injection resistance make Sonnet 5 a safe default.
Which should you use?
The practical rule mirrors Anthropic’s own effort-dial framing: default to Sonnet 5, and escalate to Opus 4.8 only when a specific task justifies the premium. Use this quick checklist to decide:
- Daily coding and fast iteration → Sonnet 5 (within six points of Opus on SWE-bench Pro at ~40% less cost).
- High-volume agentic pipelines → Sonnet 5 (cost compounds across thousands of calls).
- Free-tier or no-subscription use → Sonnet 5 (the only one of the two on the Free plan).
- Knowledge work and analysis → Sonnet 5 (ties Opus 4.8 on GDPval-AA v2).
- Correctness-critical, multi-file coding → Opus 4.8 (+6.0 on SWE-bench Pro is the accuracy you pay for).
- Hard reasoning without tools → Opus 4.8 (+6.6 on Humanity’s Last Exam).
- Long-horizon work where small errors compound → Opus 4.8.
Beyond the Claude lineup, Sonnet 5 also functions as a lower-cost alternative to OpenAI’s GPT-5.6 Sol and Google’s Gemini 3.5 Flash for everyday agentic tasks, undercutting Opus 4.8, GPT-5.5, and Gemini 3.1 Pro on price while sitting just above Gemini 3.5 Flash. Try Sonnet 5 on your own workload before assuming you need the flagship — for most jobs, the cheaper model now loses far less often than it used to.
