Needle Tops Qwen3 In CPU Tool-Calling
A local benchmark pits Needle 26M against Qwen3-0.6B on 50 function-calling queries across five tiers, and the tiny specialist wins on tool selection while running 4.4x faster. The result depends heavily on prompt/schema fit: Needle needs flat tool schemas, while Qwen3 needs a chat-template setup that actually emits tool calls.
This is a useful reminder that “tool-calling model” and “chat model that can call tools” are not the same product class. Needle looks like a real dispatcher for fixed tool palettes; Qwen3 looks like the safer generalist once you need language robustness or broader conversational behavior.
- –Needle wins on tool_match overall and latency, but its main failure mode is wrong-tool routing, not argument quality.
- –Qwen3’s main failure mode is more brittle: it often never emits a tool call at all and falls back to prose.
- –The T3 implicit-intent tier is the real separator; Needle maps intent directly, while Qwen3 collapses when the tool name is not explicit.
- –The schema lesson matters: feeding Needle OpenAI-style JSON Schema tanked results until the author converted to Needle’s flat schema format.
- –The edge-case win for Qwen3 on Hindi/French shows this is not a universal small-model verdict, just a strong case for tiny specialist routers on CPUs.
DISCOVERED
2h ago
2026-05-23
PUBLISHED
2h ago
2026-05-23
RELEVANCE
AUTHOR
gvij