REDDIT · REDDIT// 3d agoOPENSOURCE RELEASE

Qwopus v3 gets NVFP4, AWQ, FP8 quants

The Qwopus v3 collection adds mixed-precision builds of Jackrong/Qwopus3.5-27B-v3, including NVFP4, AWQ-4bit, and FP8 dynamic variants. The NVFP4 checkpoint is verified on vLLM with Blackwell hardware, aiming to make the model practical on smaller single-GPU setups.

// ANALYSIS

This is less a new model than a distribution win: the interesting part is making a strong 27B-class reasoning model easier to run on real hardware. For local-LLM users, that matters more than another benchmark claim.

–NVFP4 is the headline variant because it trims memory hardest, with the model card calling it the smallest build at about 24 GB.
–The release is clearly tuned for vLLM users, but it is not frictionless: Blackwell needs a patched vLLM path and CUTLASS backend for NVFP4 GEMM.
–AWQ-4bit is the compatibility play, while FP8 dynamic is the safer middle ground for teams that can spare more VRAM.
–The model keeps the hybrid Qwen3.5 DeltaNet + softmax architecture and MTP head, so this is about preserving behavior while compressing weights, not changing the underlying model.
–SGLang is listed as unsupported for this checkpoint, so adopters need to be aligned with the vLLM stack.

// TAGS

qwopusqwen3.5llmopen-sourcequantizationinferencevllmnvfp4

DISCOVERED

3d ago

2026-04-09

PUBLISHED

3d ago

2026-04-09

RELEVANCE

9/ 10

AUTHOR

monoidconcat