ATLAS pushes local Qwen toward frontier

// 91d agoOPENSOURCE RELEASE

ATLAS pushes local Qwen toward frontier

ATLAS is an open-source test-time compute pipeline that wraps a frozen Qwen3-14B model with planning, verification, and repair loops to improve coding performance on consumer hardware. The project reports 74.6% pass@1 on 599 LiveCodeBench v5 problems with no fine-tuning or cloud APIs, while openly acknowledging that latency and reproducibility still need work.

// ANALYSIS

ATLAS is a strong example of where open-source coding systems are heading: less obsession with bigger base models, more leverage from smarter inference-time orchestration. The caveat is that this is still an early, research-heavy stack, so the biggest question is not the headline score but how reproducibly others can get it running.

–The core pitch is infrastructure, not a new model: ATLAS layers PlanSearch, energy-based candidate scoring, sandbox execution, and self-repair on top of a frozen local Qwen model.
–The benchmark claim is interesting but not a clean apples-to-apples win, because ATLAS uses best-of-3 plus iterative repair while the README compares against single-shot API model scores from a different evaluation set.
–For developers tired of paying API bills, the real appeal is self-hosted coding assistance with MaaS-style plumbing that can connect tools like OpenCode or Claude Code to a local stack.
–The tradeoff is brutal latency: easy tasks finish quickly, but hard coding problems can take up to an hour, which makes this more of a hacker's benchmark rig than a drop-in daily driver today.
–The open-source release matters because it packages a lot of scattered test-time compute ideas into one inspectable system, giving the community something concrete to reproduce, critique, and improve.

// TAGS

atlasllmai-codingopen-sourceself-hostedbenchmark

DISCOVERED

91d ago

2026-03-10

PUBLISHED

91d ago

2026-03-10

RELEVANCE

8/ 10

AUTHOR

Additional_Wish_3619

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL56m ago

Claude Fable 5 hits Google Cloud

Anthropic's new Mythos-class frontier AI model, Claude Fable 5, is now generally available on Google Cloud's Agent Platform (Vertex AI). Designed for complex, long-horizon reasoning and autonomous workflows, Fable 5 is built for tasks such as software engineering, deep research, and multi-day agentic execution, featuring built-in safety guardrails that automatically redirect sensitive queries to Claude Opus 4.8.

UPDATE1h ago

B.AI integrates Claude Fable 5 into developer API

Developer platform B.AI has integrated Anthropic's Claude Fable 5 model into its API ecosystem. Developers can now utilize Claude Fable 5's advanced reasoning and code generation capabilities within B.AI's unified, OpenAI-compatible API framework, which simplifies model access, agent identity management, and transaction payments.

MODEL1h ago

Claude Fable 5 solves logic benchmarks

Anthropic's newly released Claude Fable 5 model demonstrates the capability to solve difficult reasoning and logic questions that commonly trip up other LLMs, such as counting characters or comparing numeric values. As the first publicly available model in Anthropic's Mythos-class architecture, Fable 5 leverages automated guardrails that route restricted topics to Claude Opus 4.8.