Qwen3.5 local coding benchmark disappoints

// 79d agoTUTORIAL

Qwen3.5 local coding benchmark disappoints

A LocalLLaMA user ran Qwen3.5-27B and Qwen3.5-35B-A3B through Claude Code on oMLX on an M4 Max 40-core/64GB Mac, but a simple Bomberman prompt still produced unusable code. The thread turns into a practical ask about how to benchmark coding LLMs, which settings matter, and whether the dense 27B or sparse 35B-A3B is the better local pick.

// ANALYSIS

This is more an orchestration problem than a model verdict. For local coding, the agent loop, prompt shape, and sampling defaults matter almost as much as the model family.

–Qwen3.5's docs recommend conservative coding settings like `temperature=0.6`, `top_p=0.95`, `top_k=20`, and `presence_penalty=0.0`; a generic chat preset can make the same model look far worse than it is.
–The official Qwen3.5 cards put the dense 27B slightly ahead of the 35B-A3B on SWE-bench Verified, so bigger total parameter counts are not automatically better for coding.
–Context length helps the model remember repo state, not magically improve reasoning; once the thread balloons, you pay more latency and lose focus.
–Claude Code can route to local backends, but small models need shorter, test-driven tasks and tighter prompts to stay on rails.
–If you want a fair benchmark, use edit-run-fix loops with pass/fail tests instead of a one-shot “build me a game” prompt.

// TAGS

qwen3-5llmai-codingagentbenchmarkinferenceself-hostedopen-weights

DISCOVERED

79d ago

2026-03-23

PUBLISHED

79d ago

2026-03-23

RELEVANCE

8/ 10

AUTHOR

shirogeek

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL35m ago

Anthropic releases public Claude Mythos model

Anthropic has publicly released a modified version of its frontier AI model, Claude Mythos, under the name Claude Fable 5. The new public version incorporates safety guardrails to restrict offensive cyber capabilities while the unrestricted model remains limited to vetted partners.

MODEL38m ago

Anthropic launches Claude Fable 5

Anthropic has launched Claude Fable 5, a new "Mythos-class" model designed for complex agentic workflows, software engineering, and research synthesis. The model is available via the Claude API, subscription plans, and cloud platforms, with safety guardrails that fallback to Claude Opus for risky queries.

UPDATE47m ago

Vercel v0 adds /improve via Claude Fable 5

Vercel has integrated a new /improve command into its generative UI design tool, v0, to let users leverage Anthropic's new Claude Fable 5 reasoning model. The feature allows developers to invoke the model's advanced reasoning capabilities to iterate, polish, and optimize generated UI code.

Qwen3.5 local coding benchmark disappoints