AdamBench ranks local coding LLMs

// 107d agoBENCHMARK RESULT

AdamBench ranks local coding LLMs

AdamBench is a self-published benchmark for local LLMs in a simple agentic-coding workflow, run on an RTX 5080 16GB + 64GB RAM workstation. The repo includes prompts, review outputs, methodology, and visualizations; Qwen3.5 122b A10b won overall, while Qwen3.5 35b A3b and gpt-oss-20b/120b look like the most practical daily picks.

// ANALYSIS

This feels less like a universal leaderboard and more like a brutally honest local-model reality check, which is exactly why it’s useful. The score favors not just raw coding quality, but also how well a model survives iterative repair loops without wasting time or tokens.

–Qwen3.5 122b A10b takes the top AdamBench score, but Qwen3.5 35b A3b is the author’s daily driver because it balances quality, speed, and context headroom.
–gpt-oss-120b and gpt-oss-20b are the standout surprises: fast for their size and unusually token-efficient, which matters a lot in agentic coding.
–Nemotron models lag hard on quality and efficiency; even the best one lands around the top 10, with huge reasoning-token overhead.
–The benchmark is intentionally single-run and self-repair heavy, so it measures real workflow resilience more than clean one-shot coding ability.
–Models that failed on tool calling or chat templates were excluded, which is harsh but sensible if the goal is practical local usability.

// TAGS

adambenchbenchmarkai-codingagentllmopen-sourceself-hosted

DISCOVERED

107d ago

2026-03-26

PUBLISHED

107d ago

2026-03-26

RELEVANCE

9/ 10

AUTHOR

Real_Ebb_7417

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL1h ago

Reve 2.1 drops native 4K rendering

Reve has released version 2.1 of its creative image generation model, introducing native 4K rendering, object-level editing, and a new "Live Layers" feature. The update enables users to perform localized edits and manage layouts directly, catering to professional design workflows requiring precise control.

RESEARCH1h ago

UCSD researchers successfully demonstrate the first in-vivo teleoperated surgical procedures using general-purpose humanoid robots.

Researchers at the University of California San Diego (UCSD) have achieved a milestone in medical robotics by using Unitree G1 general-purpose humanoid robots (nicknamed "Surgie") to perform laparoscopic gallbladder removals on live animal subjects. The study, published in Nature, evaluated a teleoperated humanoid platform that utilizes standard surgical instruments via custom-made hand adapters. In the trials, the researchers successfully demonstrated both human-robot teams (a humanoid operated by a teleoperator assisting a human surgeon) and robot-robot teams (two humanoids working cooperatively) to complete the surgical tasks. This research indicates that while humanoid platforms are currently slower and less precise than specialized systems like the da Vinci, they offer a far more compact, versatile, and cost-effective alternative that could expand surgical access to remote, rural, or emergency settings.

OPEN SOURCE1h ago

ABot-World simulates infinite 720p worlds on single GPU

ABot-World is an open-source, action-conditioned infinite world simulator designed to generate interactive 720p environments at 16 frames per second with low latency on a single desktop GPU. By utilizing an NVIDIA RTX 5090 and requiring just 19GB of GPU memory, this embodied world model offers physical compliance, action controllability, and zero-shot generalization, making real-time, interactive environment simulation accessible on consumer-grade hardware.