Qwen3.5 MXFP4 artifacts hit NVIDIA Blackwell
NVIDIA DGX Spark users report that running Qwen3.5-35B-A3B with MXFP4 quantization results in intermittent Chinese character artifacts during long generations. Although the custom vLLM implementation provides a significant performance boost—reaching approximately 62 tokens per second—numerical instability in the Marlin MoE kernel on Blackwell hardware causes the model to hallucinate bilingual tokens after as few as 50 output steps.
The performance-reliability gap on Blackwell is widening as early adopters trade model accuracy for 4-bit microscaling throughput. Quantizing Qwen's attention layers to MXFP4 triggers high KL divergence, breaking the MoE router's ability to stay within English-language experts. Intermittent artifacts suggest a kernel misalignment or weight-packing bug in the experimental Marlin MoE implementation that is unique to the SM121 architecture. For production RAG pipelines, the reliability trade-off remains unacceptable compared to standard BF16 or official Qwen FP8 checkpoints, as the software-hardware lag in the Blackwell deployment cycle forces developers to choose between speed and stability until kernel support matures.
DISCOVERED
14d ago
2026-03-29
PUBLISHED
14d ago
2026-03-29
RELEVANCE
AUTHOR
kaltinator