OPEN_SOURCE ↗
REDDIT · REDDIT// 3d agoOPENSOURCE RELEASE
Qwopus v3 gets NVFP4, AWQ, FP8 quants
The Qwopus v3 collection adds mixed-precision builds of Jackrong/Qwopus3.5-27B-v3, including NVFP4, AWQ-4bit, and FP8 dynamic variants. The NVFP4 checkpoint is verified on vLLM with Blackwell hardware, aiming to make the model practical on smaller single-GPU setups.
// ANALYSIS
This is less a new model than a distribution win: the interesting part is making a strong 27B-class reasoning model easier to run on real hardware. For local-LLM users, that matters more than another benchmark claim.
- –NVFP4 is the headline variant because it trims memory hardest, with the model card calling it the smallest build at about 24 GB.
- –The release is clearly tuned for vLLM users, but it is not frictionless: Blackwell needs a patched vLLM path and CUTLASS backend for NVFP4 GEMM.
- –AWQ-4bit is the compatibility play, while FP8 dynamic is the safer middle ground for teams that can spare more VRAM.
- –The model keeps the hybrid Qwen3.5 DeltaNet + softmax architecture and MTP head, so this is about preserving behavior while compressing weights, not changing the underlying model.
- –SGLang is listed as unsupported for this checkpoint, so adopters need to be aligned with the vLLM stack.
// TAGS
qwopusqwen3.5llmopen-sourcequantizationinferencevllmnvfp4
DISCOVERED
3d ago
2026-04-09
PUBLISHED
3d ago
2026-04-09
RELEVANCE
9/ 10
AUTHOR
monoidconcat