YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Qwen3.5 397B probes pooled VRAM, RAM speeds

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Qwen3.5 397B probes pooled VRAM, RAM speeds
OPEN LINK ↗
// 82d agoBENCHMARK RESULT

Qwen3.5 397B probes pooled VRAM, RAM speeds

Redditors are trying to pin down real tok/s for Qwen3.5-397B-A17B in hybrid VRAM+system RAM setups, because Unsloth’s 25+ tok/s claim depends heavily on CPU, channel count, and memory speed. The thread is less about theory than a practical buying guide for anyone considering this MoE model locally.

// ANALYSIS

The interesting question isn’t whether the model runs; it’s whether it stays fast once most of the weight shard spills out of VRAM and onto host memory.

  • Qwen3.5-397B-A17B is a 397B-total, 17B-active MoE model, so hybrid offloading shifts the bottleneck from GPU compute to memory bandwidth.
  • Unsloth’s headline number looks plausible only on very fast memory systems; throughput should swing hard between mainstream dual-channel desktops and high-bandwidth workstation or unified-memory setups.
  • Community reports around similar Qwen3.5 local runs already vary widely, from roughly 10 tok/s in RAM-heavy multi-GPU rigs to the mid-30s on fast unified-memory hardware, which makes the ask for exact configs completely reasonable.
  • For buyers, RAM topology matters almost as much as the GPU itself; if you want this model to feel responsive, memory bandwidth is the spec to watch.
// TAGS
qwen3.5-397b-a17bllminferencegpuopen-weightsbenchmark

DISCOVERED

82d ago

2026-03-19

PUBLISHED

82d ago

2026-03-19

RELEVANCE

8/ 10

AUTHOR

Leading-Month5590