YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Qwen3.6-35B-A3B hits 400 tok/s on H100

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Qwen3.6-35B-A3B hits 400 tok/s on H100
OPEN LINK ↗
// 45d agoINFRASTRUCTURE

Qwen3.6-35B-A3B hits 400 tok/s on H100

A high-performance SGLang setup for Qwen3.6-35B-A3B achieves record-breaking speeds on a single NVIDIA H100 by combining DFlash parallel speculative decoding with FP8 precision. The implementation enables real-time agentic workflows with inference speeds exceeding 400 tokens per second for code generation.

// ANALYSIS

The era of "instant" local LLMs has arrived—35B models are now hitting speeds previously reserved for tiny 3B models, fundamentally changing the latency expectations for developer tools.

  • DFlash speculative decoding is the primary speed driver, using a parallel block diffusion model to predict multiple tokens at once rather than the serial bottleneck of traditional autoregressive drafting.
  • Reaching 400+ tok/s makes this setup a perfect fit for "Claude Code" and other agentic loops where prompt ingestion and rapid-fire token generation determine the "feel" of the developer experience.
  • The use of FP8 weights and KV cache is critical for maximizing the H100's throughput, proving that native FP8 hardware support is now the baseline for high-performance hosting.
  • Qwen3.6's MoE architecture (3B active parameters) hits the "Goldilocks zone" for H100 memory bandwidth, providing high-tier reasoning without the compute overhead of dense 30B+ models.
// TAGS
qwen3.6-35b-a3bllmsglangh100speculative-decodinginferenceopen-weights

DISCOVERED

45d ago

2026-04-28

PUBLISHED

45d ago

2026-04-28

RELEVANCE

8/ 10

AUTHOR

Asleep_Training3543