BACK_TO_FEEDAICRIER_2
Claude Sonnet 4.6 hits near-human 72.5% on OSWorld
OPEN_SOURCE ↗
YT · YOUTUBE// 25d agoPRODUCT LAUNCH

Claude Sonnet 4.6 hits near-human 72.5% on OSWorld

Anthropic's Claude Sonnet 4.6 delivers a breakthrough in autonomous "computer use" with a 72.5% score on OSWorld-Verified, nearing human-level GUI navigation proficiency. This update brings "Opus-level" intelligence, a 1M token context window, and significantly improved coding capabilities to the mid-tier Sonnet price point, maintaining its competitive $3/$15 token rates.

// ANALYSIS

Claude Sonnet 4.6 is a category-defining agentic model that renders previous "computer use" capabilities obsolete and puts massive pressure on the flagship tiers of competitors.

  • The leap from 28% to 72.5% on OSWorld-Verified in a single year signals the arrival of reliable, autonomous agent workflows for complex desktop tasks.
  • A 79.6% score on SWE-bench Verified cements its status as the premier model for autonomous software engineering, outperforming the previous flagship Opus 4.5.
  • The 1 million token context window (now generally available) enables seamless reasoning across massive codebases and document sets without the overhead of complex RAG pipelines.
  • Pricing remains at the standard Sonnet rate, offering a massive intelligence-to-cost ratio that undercuts GPT-5.2 on both performance and efficiency.
// TAGS
claude-sonnet-4-6anthropicllmcomputer-useai-codingagentbenchmarkmcp

DISCOVERED

25d ago

2026-03-17

PUBLISHED

25d ago

2026-03-17

RELEVANCE

10/ 10

AUTHOR

Ben Davis