OPEN_SOURCE ↗
REDDIT · REDDIT// 19d agoMODEL RELEASE
Qwen3.5-Neo fine-tunes cut reasoning overhead
Jackrong released Qwen3.5-4B-Neo and Qwen3.5-9B-Neo, two community fine-tunes of Qwen3.5 aimed at more concise, efficient reasoning. The collection also ships GGUF and MLX builds for local deployment, and the 4B card claims a small MMLU-Pro subset gain with much shorter think chains.
// ANALYSIS
This feels more like a reasoning-efficiency pass than a raw capability leap, which is exactly the kind of release local-model users actually feel. The strongest numbers are still narrow, so the headline is cheaper reasoning, not a dramatic new frontier.
- –The 4B card reports 82.0% pass@1 vs 80.4% for the base model on a 250-question MMLU-Pro subset, while average think-chain length drops from 6,962 to 3,955 characters; it’s an SFT + LoRA fine-tune built with Unsloth/TRL: https://huggingface.co/Jackrong/Qwen3.5-4B-Neo
- –That matters because fewer reasoning tokens can mean lower latency, lower context pressure, and cheaper inference on consumer hardware.
- –The 9B page shows the finetune lineage and Unsloth/TRL training notes, but not much evaluation yet, so the larger variant is still more promise than proof: https://huggingface.co/Jackrong/Qwen3.5-9B-Neo
- –The collection's GGUF and MLX builds make it easy to try across local runtimes, and the Reddit thread immediately asks for broader benchmarks before anyone crowns it a winner: https://huggingface.co/collections/Jackrong/qwen35-neo https://www.reddit.com/r/LocalLLaMA/comments/1s270px/two_new_qwen35_neo_finetunes_focused_on_fast/
// TAGS
qwen35-neollmreasoningfine-tuningopen-weightsself-hosted
DISCOVERED
19d ago
2026-03-24
PUBLISHED
19d ago
2026-03-24
RELEVANCE
8/ 10
AUTHOR
FabbBr