YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Nick Lothian SQL benchmark crowns Qwen 122B

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Nick Lothian SQL benchmark crowns Qwen 122B
OPEN LINK ↗
// 56d agoBENCHMARK RESULT

Nick Lothian SQL benchmark crowns Qwen 122B

Nick Lothian’s agentic text-to-SQL benchmark found that large Qwen and Nemotron variants still dominate on a consumer RTX 5080, especially when VRAM is supplemented with RAM offload. The standout surprise is a small Qwen3.5 9B Claude-4.6 high-IQ finetune, which jumps from 5 to 16 green tests by fixing tool-call formatting.

// ANALYSIS

The big takeaway is that tool-calling quality now matters almost as much as raw model size for SQL agents, and a well-tuned small model can close a lot of the gap. But this is still a narrow benchmark: it’s a strong read on single-shot agentic SQL, not a proxy for broader codebase reasoning.

  • Qwen3.5-122B-A10B is the clear heavyweight winner here, with RAM offload making it usable on 16GB VRAM cards if you can tolerate slower inference
  • Qwen3.5-9B Claude-4.6 HighIQ is the practical surprise: most of its earlier failures came from malformed tool calls, so the finetune is doing real work, not just posturing
  • Nemotron-Cascade-2-30B-A3B looks unusually competitive for its size and deserves attention as a self-hostable sweet spot
  • The benchmark is deliberately short and agentic, so models that are good at isolated SQL generation can shine even if they may not generalize to longer multi-step coding tasks
  • For local LLM users, this reinforces the tradeoff triangle: bigger models still win on quality, but quantization, offload, and tool-call reliability decide what is actually usable
// TAGS
benchmarkllmagentsqlqwenself-hostedgpulocal-llm

DISCOVERED

56d ago

2026-04-01

PUBLISHED

56d ago

2026-04-01

RELEVANCE

8/ 10

AUTHOR

grumd