OPEN_SOURCE ↗
REDDIT · REDDIT// 16d agoBENCHMARK RESULT
AdamBench ranks local coding LLMs
AdamBench is a self-published benchmark for local LLMs in a simple agentic-coding workflow, run on an RTX 5080 16GB + 64GB RAM workstation. The repo includes prompts, review outputs, methodology, and visualizations; Qwen3.5 122b A10b won overall, while Qwen3.5 35b A3b and gpt-oss-20b/120b look like the most practical daily picks.
// ANALYSIS
This feels less like a universal leaderboard and more like a brutally honest local-model reality check, which is exactly why it’s useful. The score favors not just raw coding quality, but also how well a model survives iterative repair loops without wasting time or tokens.
- –Qwen3.5 122b A10b takes the top AdamBench score, but Qwen3.5 35b A3b is the author’s daily driver because it balances quality, speed, and context headroom.
- –gpt-oss-120b and gpt-oss-20b are the standout surprises: fast for their size and unusually token-efficient, which matters a lot in agentic coding.
- –Nemotron models lag hard on quality and efficiency; even the best one lands around the top 10, with huge reasoning-token overhead.
- –The benchmark is intentionally single-run and self-repair heavy, so it measures real workflow resilience more than clean one-shot coding ability.
- –Models that failed on tool calling or chat templates were excluded, which is harsh but sensible if the goal is practical local usability.
// TAGS
adambenchbenchmarkai-codingagentllmopen-sourceself-hosted
DISCOVERED
16d ago
2026-03-26
PUBLISHED
16d ago
2026-03-26
RELEVANCE
9/ 10
AUTHOR
Real_Ebb_7417