BACK_TO_FEEDAICRIER_2
Qwopus v3 gets NVFP4, AWQ, FP8 quants
OPEN_SOURCE ↗
REDDIT · REDDIT// 3d agoOPENSOURCE RELEASE

Qwopus v3 gets NVFP4, AWQ, FP8 quants

The Qwopus v3 collection adds mixed-precision builds of Jackrong/Qwopus3.5-27B-v3, including NVFP4, AWQ-4bit, and FP8 dynamic variants. The NVFP4 checkpoint is verified on vLLM with Blackwell hardware, aiming to make the model practical on smaller single-GPU setups.

// ANALYSIS

This is less a new model than a distribution win: the interesting part is making a strong 27B-class reasoning model easier to run on real hardware. For local-LLM users, that matters more than another benchmark claim.

  • NVFP4 is the headline variant because it trims memory hardest, with the model card calling it the smallest build at about 24 GB.
  • The release is clearly tuned for vLLM users, but it is not frictionless: Blackwell needs a patched vLLM path and CUTLASS backend for NVFP4 GEMM.
  • AWQ-4bit is the compatibility play, while FP8 dynamic is the safer middle ground for teams that can spare more VRAM.
  • The model keeps the hybrid Qwen3.5 DeltaNet + softmax architecture and MTP head, so this is about preserving behavior while compressing weights, not changing the underlying model.
  • SGLang is listed as unsupported for this checkpoint, so adopters need to be aligned with the vLLM stack.
// TAGS
qwopusqwen3.5llmopen-sourcequantizationinferencevllmnvfp4

DISCOVERED

3d ago

2026-04-09

PUBLISHED

3d ago

2026-04-09

RELEVANCE

9/ 10

AUTHOR

monoidconcat