OPEN_SOURCE ↗
REDDIT · REDDIT// 9d agoBENCHMARK RESULT
Qwen3.5-27B hits 311 tokens/second prefill on M2 Ultra
New performance benchmarks for Qwen3.5-27B using Unsloth's Dynamic (UD) quants show exceptional prefill performance on Apple Silicon hardware. Running on a Mac Studio M2 Ultra with 64GB of unified memory, the dense hybrid model achieved over 311 tokens/second prefill speed using Q8 quantization, demonstrating that high-precision local inference is increasingly viable for large-scale context windows on consumer-pro hardware.
// ANALYSIS
Qwen3.5-27B is proving to be a top-tier "dense" alternative to MoE models, offering superior consistency and reasoning density for local deployment.
- –The hybrid Gated DeltaNet architecture enables massive 262K context scaling without the typical memory or performance degradation seen in pure transformer models.
- –Unsloth's UD (Dynamic) quants use importance-matrix weighting to preserve precision in critical layers, making the Q8 and Q4 versions highly competitive for complex agentic workflows.
- –The 27B parameter size is the "sweet spot" for 64GB systems, allowing for high context headroom (KV cache) even at high quantization levels.
- –M2 Ultra's 800 GB/s bandwidth remains the gold standard for local LLM performance, outclassing most standard PC setups for document processing and RAG.
// TAGS
qwen3.5-27bllmlocal-llmbenchmarkunslothopen-weightsinference
DISCOVERED
9d ago
2026-04-03
PUBLISHED
9d ago
2026-04-03
RELEVANCE
8/ 10
AUTHOR
channingao