OPEN_SOURCE ↗
REDDIT · REDDIT// 17d agoBENCHMARK RESULT
A.T.L.A.S. hits 74.6% LiveCodeBench on frozen 14B
A.T.L.A.S is a self-hosted inference stack that wraps a frozen Qwen3-14B model in constraint-driven generation, sandboxed verification, and self-repair. The V3 pipeline claims 74.6% LiveCodeBench pass@1 on a single 16GB consumer GPU without fine-tuning, API calls, or cloud dependencies.
// ANALYSIS
The interesting part here is not that a small model suddenly became huge-model smart; it is that test-time orchestration, verification, and repair can squeeze a lot more out of frozen weights. That makes ATLAS compelling for privacy-conscious and cost-sensitive teams, but the headline score should be read with the repo’s own caveats in mind.
- –The gain comes from a multi-step pipeline, not single-shot inference: PlanSearch, BudgetForcing, Lens selection, and iterative repair do the heavy lifting.
- –The self-hosted story is real: no API keys, no usage metering, and the repo pegs local electricity cost at roughly $0.004 per task.
- –The benchmark comparison is not perfectly apples-to-apples, since the repo notes competitor scores come from a different task set and single-shot pass@1 baselines.
- –The system is still coding-first, and the weaker GPQA Diamond and SciCode numbers show the general reasoning story is not fully solved yet.
- –V3.1 sounds like a cleanup-and-scale pass, with parallelization and better Lens training aimed at reducing the current latency tradeoff.
// TAGS
atlasai-codingreasoningbenchmarkself-hostedgpuopen-source
DISCOVERED
17d ago
2026-03-26
PUBLISHED
17d ago
2026-03-26
RELEVANCE
9/ 10
AUTHOR
GoodSamaritan333