Qwen3.5 benchmarks show production-ready MoE on H20 GPUs

// 67d agoBENCHMARK RESULT

Qwen3.5 benchmarks show production-ready MoE on H20 GPUs

New performance tests for Alibaba's Qwen3.5-397B-A17B Mixture-of-Experts (MoE) model on an 8x NVIDIA H20 cluster demonstrate efficient, high-throughput serving for 400B-class models. By leveraging SGLang’s optimized MoE kernels and the H20’s massive 141GB VRAM per card, the setup achieves the memory capacity required for large batch sizes and extended context windows, making it a viable alternative for production environments constrained by hardware availability.

// ANALYSIS

The NVIDIA H20 emerges as an ideal choice for massive MoE inference, trading raw compute for the critical memory headroom needed to keep 400B-class models in-memory without aggressive quantization. SGLang's RadixAttention and specialized kernels significantly reduce the routing overhead inherent in sparse models like Qwen3.5. With 1.1TB of total VRAM, the 8x H20 setup provides the necessary capacity for serving long-context requests up to 262k tokens natively. This benchmark demonstrates that for large MoE models, memory bandwidth and capacity are increasingly more valuable than peak TFLOPS for cost-effective production serving.

// TAGS

qwenllmmoesglanggpubenchmarkinferenceopen-sourceqwen3.5-397b-a17b

DISCOVERED

67d ago

2026-03-21

PUBLISHED

67d ago

2026-03-21

RELEVANCE

9/ 10

AUTHOR

MathematicianNo2877

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL11m ago

Prism ML launches Bonsai Image 4B variants

Prism ML has released Bonsai Image 4B, a compact text-to-image diffusion model family built from FLUX.2 Klein 4B for local inference on Apple Silicon and NVIDIA GPUs. The launch includes 1-bit and ternary variants, plus Bonsai Studio for trying the model on iPhone.

OPEN SOURCE18m ago

OpenMobius-skill packages ICT, SMC for agents

OpenMobius-skill turns ICT and smart money concepts into a reusable skill for Claude Code, Codex, OpenClaw, and Hermes, backed by 964 knowledge cards, live market data, and chart generation. Its 0.2.0 update on 2026-05-23 made the SMC structural indicator the default analysis path and added automatic overlays plus freshness disclosure.

OPEN SOURCE18m ago

Hallmark fights AI template sameness

Hallmark is an open-source design skill for Claude Code, Cursor, and Codex that pushes generated UIs away from samey, default-looking layouts. It varies macrostructure, theme, and layout, then runs style gates before handing work back.