BACK_TO_FEEDAICRIER_2
A.T.L.A.S. hits 74.6% LiveCodeBench on frozen 14B
OPEN_SOURCE ↗
REDDIT · REDDIT// 17d agoBENCHMARK RESULT

A.T.L.A.S. hits 74.6% LiveCodeBench on frozen 14B

A.T.L.A.S is a self-hosted inference stack that wraps a frozen Qwen3-14B model in constraint-driven generation, sandboxed verification, and self-repair. The V3 pipeline claims 74.6% LiveCodeBench pass@1 on a single 16GB consumer GPU without fine-tuning, API calls, or cloud dependencies.

// ANALYSIS

The interesting part here is not that a small model suddenly became huge-model smart; it is that test-time orchestration, verification, and repair can squeeze a lot more out of frozen weights. That makes ATLAS compelling for privacy-conscious and cost-sensitive teams, but the headline score should be read with the repo’s own caveats in mind.

  • The gain comes from a multi-step pipeline, not single-shot inference: PlanSearch, BudgetForcing, Lens selection, and iterative repair do the heavy lifting.
  • The self-hosted story is real: no API keys, no usage metering, and the repo pegs local electricity cost at roughly $0.004 per task.
  • The benchmark comparison is not perfectly apples-to-apples, since the repo notes competitor scores come from a different task set and single-shot pass@1 baselines.
  • The system is still coding-first, and the weaker GPQA Diamond and SciCode numbers show the general reasoning story is not fully solved yet.
  • V3.1 sounds like a cleanup-and-scale pass, with parallelization and better Lens training aimed at reducing the current latency tradeoff.
// TAGS
atlasai-codingreasoningbenchmarkself-hostedgpuopen-source

DISCOVERED

17d ago

2026-03-26

PUBLISHED

17d ago

2026-03-26

RELEVANCE

9/ 10

AUTHOR

GoodSamaritan333