X · X// 5h agoMODEL RELEASE

Qwopus MoE 35B-A3B lands in GGUF

Qwopus MoE 35B-A3B is a GGUF release built on Qwen3.5-35B-A3B and distilled from Claude Opus 4.6 reasoning traces. It targets local developers who want Opus-flavored reasoning on consumer hardware, with quantizations small enough to run on a single high-end GPU.

// ANALYSIS

This is a clever distillation package, not a new frontier base model: it pairs a sparse Qwen MoE backbone with Opus-style reasoning traces, then ships the result in a format local-stack users can actually deploy. The appeal is obvious for agentic coding and research workflows, but the model card also suggests the usual tradeoff between richer visible output and raw speed.

–The MoE backbone keeps compute low at inference time, with roughly 3B active parameters out of 35B total.
–GGUF quantizations make it practical for local use, including a 4-bit build sized for around 24GB VRAM.
–The model card shows strengths in coding, planning, and research-style output, but also some reasoning and latency regressions versus the earlier Opus-distilled variant.
–This is best read as an opinionated local reasoning model for power users, not a universal replacement for the underlying closed models.

// TAGS

qwopusllmreasoningopen-sourceopen-weightsself-hostedinference

DISCOVERED

5h ago

2026-04-29

PUBLISHED

2d ago

2026-04-27

RELEVANCE

9/ 10

AUTHOR

HuggingModels