Qwen3-Coder-Next lands 54GB APEX quant
StacksNathan released the first APEX I-Quality quantization of Qwen3-Coder-Next 80B, with a code-calibrated importance matrix built from 50,575 samples. The GGUF comes in at 54.1GB and is positioned for local coding-agent use on llama.cpp, Ollama, and similar runtimes.
This is more useful than flashy: the underlying Qwen3-Coder-Next model is already the important part, and this release makes it materially easier to run without throwing away code quality.
- –The code-specific imatrix is the real value add here; calibration on code samples should preserve the weights that matter for syntax, refactors, and tool use better than generic quantization.
- –54.1GB puts it in the practical zone for serious local inference on 128GB-class machines and some multi-GPU setups, which is where large coding models start to become usable.
- –The repo's own speed claims are attractive, but real throughput will still hinge on backend, offload, and context length.
- –This is an open-source packaging win, not a new base-model launch, so the significance is in accessibility and deployment rather than benchmark novelty.
DISCOVERED
66d ago
2026-04-05
PUBLISHED
66d ago
2026-04-05
RELEVANCE
AUTHOR
StacksHosting