BACK_TO_FEEDAICRIER_2
AdamBench ranks local coding LLMs
OPEN_SOURCE ↗
REDDIT · REDDIT// 16d agoBENCHMARK RESULT

AdamBench ranks local coding LLMs

AdamBench is a self-published benchmark for local LLMs in a simple agentic-coding workflow, run on an RTX 5080 16GB + 64GB RAM workstation. The repo includes prompts, review outputs, methodology, and visualizations; Qwen3.5 122b A10b won overall, while Qwen3.5 35b A3b and gpt-oss-20b/120b look like the most practical daily picks.

// ANALYSIS

This feels less like a universal leaderboard and more like a brutally honest local-model reality check, which is exactly why it’s useful. The score favors not just raw coding quality, but also how well a model survives iterative repair loops without wasting time or tokens.

  • Qwen3.5 122b A10b takes the top AdamBench score, but Qwen3.5 35b A3b is the author’s daily driver because it balances quality, speed, and context headroom.
  • gpt-oss-120b and gpt-oss-20b are the standout surprises: fast for their size and unusually token-efficient, which matters a lot in agentic coding.
  • Nemotron models lag hard on quality and efficiency; even the best one lands around the top 10, with huge reasoning-token overhead.
  • The benchmark is intentionally single-run and self-repair heavy, so it measures real workflow resilience more than clean one-shot coding ability.
  • Models that failed on tool calling or chat templates were excluded, which is harsh but sensible if the goal is practical local usability.
// TAGS
adambenchbenchmarkai-codingagentllmopen-sourceself-hosted

DISCOVERED

16d ago

2026-03-26

PUBLISHED

16d ago

2026-03-26

RELEVANCE

9/ 10

AUTHOR

Real_Ebb_7417