OPEN_SOURCE ↗
REDDIT · REDDIT// 7h agoBENCHMARK RESULT
LM Studio CPU Threads Peak at Five
A Reddit benchmark suggests LM Studio’s CPU thread pool has a clear sweet spot when MoE expert weights are pushed onto CPU. On the tested Ryzen 9 3900X setup, throughput topped out around five threads, with higher counts likely hitting memory-bandwidth limits instead of adding useful compute.
// ANALYSIS
This is a useful reminder that local LLM performance tuning is often bottlenecked by memory, not raw core count. Once you start mixing GPU offload with CPU-resident MoE layers, “more threads” can become counterproductive fast.
- –The post tests `qwen3.6-35b-a3b@MXFP4` with all GPU layers offloaded and 16 forced CPU layers, so the result is specific but practical
- –The drop-off above five threads lines up with the common RAM-bandwidth ceiling on consumer systems, especially older DDR4 platforms
- –The finding matters for LM Studio users because its MoE CPU-offload feature makes this tuning path easy to hit in real workloads
- –The discussion also reinforces that prompt processing and token generation behave differently, so one thread setting may not fit every phase
- –For developers serving local models, this is a reminder to benchmark thread pools per machine instead of assuming physical core count is the right target
// TAGS
lm-studiollmbenchmarkinferencegpu
DISCOVERED
7h ago
2026-04-18
PUBLISHED
8h ago
2026-04-18
RELEVANCE
8/ 10
AUTHOR
bonobomaster