OPEN_SOURCE ↗
REDDIT · REDDIT// 23d agoBENCHMARK RESULT
Qwen3.5 397B probes pooled VRAM, RAM speeds
Redditors are trying to pin down real tok/s for Qwen3.5-397B-A17B in hybrid VRAM+system RAM setups, because Unsloth’s 25+ tok/s claim depends heavily on CPU, channel count, and memory speed. The thread is less about theory than a practical buying guide for anyone considering this MoE model locally.
// ANALYSIS
The interesting question isn’t whether the model runs; it’s whether it stays fast once most of the weight shard spills out of VRAM and onto host memory.
- –Qwen3.5-397B-A17B is a 397B-total, 17B-active MoE model, so hybrid offloading shifts the bottleneck from GPU compute to memory bandwidth.
- –Unsloth’s headline number looks plausible only on very fast memory systems; throughput should swing hard between mainstream dual-channel desktops and high-bandwidth workstation or unified-memory setups.
- –Community reports around similar Qwen3.5 local runs already vary widely, from roughly 10 tok/s in RAM-heavy multi-GPU rigs to the mid-30s on fast unified-memory hardware, which makes the ask for exact configs completely reasonable.
- –For buyers, RAM topology matters almost as much as the GPU itself; if you want this model to feel responsive, memory bandwidth is the spec to watch.
// TAGS
qwen3.5-397b-a17bllminferencegpuopen-weightsbenchmark
DISCOVERED
23d ago
2026-03-19
PUBLISHED
23d ago
2026-03-19
RELEVANCE
8/ 10
AUTHOR
Leading-Month5590