OPEN_SOURCE ↗
REDDIT · REDDIT// 11d agoBENCHMARK RESULT
Qwen3.5 9B hits 20 tps locally
A Reddit user says this Qwen3.5 9B reasoning-distilled model runs at 20 tokens per second on an RX 580 laptop with just 8GB of RAM and swap. The post is less a formal benchmark than a proof that aggressive distillation and quantization can make strong local inference surprisingly accessible.
// ANALYSIS
The interesting part here is not “state of the art” hype, it’s the hardware economics: a 9B reasoning model is usable on commodity, heavily constrained gear when the stack is tuned hard enough.
- –Shows how far local inference has moved with smaller, distilled reasoning models and practical quantization
- –The reported throughput is meaningful for hobbyist agents, MCP experiments, and offline workflows, even if it is not a controlled benchmark
- –The setup highlights the tradeoffs: PCIe x4, USB-attached NVMe, and system swap all signal a fragile but functional performance envelope
- –Useful signal for low-VRAM users deciding whether 9B-class models are the sweet spot for local reasoning
// TAGS
llmreasoninginferencegpuself-hostedopen-sourceqwen3.5-9b-gemini-3.1-pro-reasoning-distill-gguf
DISCOVERED
11d ago
2026-03-31
PUBLISHED
12d ago
2026-03-31
RELEVANCE
6/ 10
AUTHOR
ItzYaBoiGoogle