BACK_TO_FEEDAICRIER_2
Qwen3.5 397B probes pooled VRAM, RAM speeds
OPEN_SOURCE ↗
REDDIT · REDDIT// 23d agoBENCHMARK RESULT

Qwen3.5 397B probes pooled VRAM, RAM speeds

Redditors are trying to pin down real tok/s for Qwen3.5-397B-A17B in hybrid VRAM+system RAM setups, because Unsloth’s 25+ tok/s claim depends heavily on CPU, channel count, and memory speed. The thread is less about theory than a practical buying guide for anyone considering this MoE model locally.

// ANALYSIS

The interesting question isn’t whether the model runs; it’s whether it stays fast once most of the weight shard spills out of VRAM and onto host memory.

  • Qwen3.5-397B-A17B is a 397B-total, 17B-active MoE model, so hybrid offloading shifts the bottleneck from GPU compute to memory bandwidth.
  • Unsloth’s headline number looks plausible only on very fast memory systems; throughput should swing hard between mainstream dual-channel desktops and high-bandwidth workstation or unified-memory setups.
  • Community reports around similar Qwen3.5 local runs already vary widely, from roughly 10 tok/s in RAM-heavy multi-GPU rigs to the mid-30s on fast unified-memory hardware, which makes the ask for exact configs completely reasonable.
  • For buyers, RAM topology matters almost as much as the GPU itself; if you want this model to feel responsive, memory bandwidth is the spec to watch.
// TAGS
qwen3.5-397b-a17bllminferencegpuopen-weightsbenchmark

DISCOVERED

23d ago

2026-03-19

PUBLISHED

23d ago

2026-03-19

RELEVANCE

8/ 10

AUTHOR

Leading-Month5590