OPEN_SOURCE ↗
YT · YOUTUBE// 25d agoPRODUCT LAUNCH
Claude Sonnet 4.6 hits near-human 72.5% on OSWorld
Anthropic's Claude Sonnet 4.6 delivers a breakthrough in autonomous "computer use" with a 72.5% score on OSWorld-Verified, nearing human-level GUI navigation proficiency. This update brings "Opus-level" intelligence, a 1M token context window, and significantly improved coding capabilities to the mid-tier Sonnet price point, maintaining its competitive $3/$15 token rates.
// ANALYSIS
Claude Sonnet 4.6 is a category-defining agentic model that renders previous "computer use" capabilities obsolete and puts massive pressure on the flagship tiers of competitors.
- –The leap from 28% to 72.5% on OSWorld-Verified in a single year signals the arrival of reliable, autonomous agent workflows for complex desktop tasks.
- –A 79.6% score on SWE-bench Verified cements its status as the premier model for autonomous software engineering, outperforming the previous flagship Opus 4.5.
- –The 1 million token context window (now generally available) enables seamless reasoning across massive codebases and document sets without the overhead of complex RAG pipelines.
- –Pricing remains at the standard Sonnet rate, offering a massive intelligence-to-cost ratio that undercuts GPT-5.2 on both performance and efficiency.
// TAGS
claude-sonnet-4-6anthropicllmcomputer-useai-codingagentbenchmarkmcp
DISCOVERED
25d ago
2026-03-17
PUBLISHED
25d ago
2026-03-17
RELEVANCE
10/ 10
AUTHOR
Ben Davis