Qwen3.5 MXFP4 artifacts hit NVIDIA Blackwell

// 73d agoINFRASTRUCTURE

Qwen3.5 MXFP4 artifacts hit NVIDIA Blackwell

NVIDIA DGX Spark users report that running Qwen3.5-35B-A3B with MXFP4 quantization results in intermittent Chinese character artifacts during long generations. Although the custom vLLM implementation provides a significant performance boost—reaching approximately 62 tokens per second—numerical instability in the Marlin MoE kernel on Blackwell hardware causes the model to hallucinate bilingual tokens after as few as 50 output steps.

// ANALYSIS

The performance-reliability gap on Blackwell is widening as early adopters trade model accuracy for 4-bit microscaling throughput. Quantizing Qwen's attention layers to MXFP4 triggers high KL divergence, breaking the MoE router's ability to stay within English-language experts. Intermittent artifacts suggest a kernel misalignment or weight-packing bug in the experimental Marlin MoE implementation that is unique to the SM121 architecture. For production RAG pipelines, the reliability trade-off remains unacceptable compared to standard BF16 or official Qwen FP8 checkpoints, as the software-hardware lag in the Blackwell deployment cycle forces developers to choose between speed and stability until kernel support matures.

// TAGS

qwen3.5-35b-a3bllmquantizationinferencegpuvllmopen-weightsmxfp4

DISCOVERED

73d ago

2026-03-29

PUBLISHED

73d ago

2026-03-29

RELEVANCE

8/ 10

AUTHOR

kaltinator

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL32m ago

Claude Fable 5 launch sparks massive developer backlash

Anthropic's Claude Fable 5 launch faces severe developer backlash over aggressive safety restrictions, high pricing, and a forced 30-day data retention policy. The model silently routes chemistry, biology, and cybersecurity requests to the older Opus 4.8 model, frustrating users with opaque downgrades and anti-distillation blocks.

MODEL33m ago

Designers praise Claude Fable 5 landing pages

Educator and designer Meng To highlighted Claude Fable 5's capability for creating landing pages on X, calling the model "a monster" for the task. Released in June 2026, Claude Fable 5 is Anthropic's latest Mythos-class AI model, featuring a 1-million-token context window, a 128,000-token output capacity, and advanced reasoning for long-horizon agentic workflows, making it highly effective for complex design and front-end code generation tasks.

MODEL1h ago

Claude Fable 5 hits Google Cloud

Anthropic's new Mythos-class frontier AI model, Claude Fable 5, is now generally available on Google Cloud's Agent Platform (Vertex AI). Designed for complex, long-horizon reasoning and autonomous workflows, Fable 5 is built for tasks such as software engineering, deep research, and multi-day agentic execution, featuring built-in safety guardrails that automatically redirect sensitive queries to Claude Opus 4.8.