Nick Lothian SQL benchmark crowns Qwen 122B

// 69d agoBENCHMARK RESULT

Nick Lothian SQL benchmark crowns Qwen 122B

Nick Lothian’s agentic text-to-SQL benchmark found that large Qwen and Nemotron variants still dominate on a consumer RTX 5080, especially when VRAM is supplemented with RAM offload. The standout surprise is a small Qwen3.5 9B Claude-4.6 high-IQ finetune, which jumps from 5 to 16 green tests by fixing tool-call formatting.

// ANALYSIS

The big takeaway is that tool-calling quality now matters almost as much as raw model size for SQL agents, and a well-tuned small model can close a lot of the gap. But this is still a narrow benchmark: it’s a strong read on single-shot agentic SQL, not a proxy for broader codebase reasoning.

–Qwen3.5-122B-A10B is the clear heavyweight winner here, with RAM offload making it usable on 16GB VRAM cards if you can tolerate slower inference
–Qwen3.5-9B Claude-4.6 HighIQ is the practical surprise: most of its earlier failures came from malformed tool calls, so the finetune is doing real work, not just posturing
–Nemotron-Cascade-2-30B-A3B looks unusually competitive for its size and deserves attention as a self-hostable sweet spot
–The benchmark is deliberately short and agentic, so models that are good at isolated SQL generation can shine even if they may not generalize to longer multi-step coding tasks
–For local LLM users, this reinforces the tradeoff triangle: bigger models still win on quality, but quantization, offload, and tool-call reliability decide what is actually usable

// TAGS

benchmarkllmagentsqlqwenself-hostedgpulocal-llm

DISCOVERED

69d ago

2026-04-01

PUBLISHED

69d ago

2026-04-01

RELEVANCE

8/ 10

AUTHOR

grumd

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE12m ago

Netlify launches an official plugin in the Cursor marketplace to provide AI models with native context on Netlify functions, databases, and deploys.

Netlify has released an official integration in the Cursor Marketplace, bringing developer-focused capabilities directly into the Cursor IDE. The plugin includes 13 skills and 27 rules to give Cursor's AI models precise context regarding Netlify's features, such as functions, edge functions, Blobs, Database, caching, the AI Gateway, CLI, and deployments.

TUTORIAL41m ago

Matt Pocock ships /teach agent skill

Matt Pocock shared a step-by-step guide for developers seeking to transition from junior to senior using coding agents like Claude Code. The process involves installing his custom /teach skill, setting up a dedicated workspace directory, and running the terminal-based AI agent.

UPDATE1h ago

Buffaly bundles local LLMs, adds self-inspection

The latest update to Buffaly, a local AI agent platform, introduces significant enhancements for offline and agentic workflows. Key upgrades include the integration of Ollama and llama.cpp directly within the Windows installer to streamline local model execution, new self-inspection tools allowing the agent to evaluate its own installed skills, tools, providers, and web modules, and the addition of audio transcription capabilities.