Qwen3.5-27B hits 311 tokens/second prefill on M2 Ultra
New performance benchmarks for Qwen3.5-27B using Unsloth's Dynamic (UD) quants show exceptional prefill performance on Apple Silicon hardware. Running on a Mac Studio M2 Ultra with 64GB of unified memory, the dense hybrid model achieved over 311 tokens/second prefill speed using Q8 quantization, demonstrating that high-precision local inference is increasingly viable for large-scale context windows on consumer-pro hardware.
Qwen3.5-27B is proving to be a top-tier "dense" alternative to MoE models, offering superior consistency and reasoning density for local deployment.
- –The hybrid Gated DeltaNet architecture enables massive 262K context scaling without the typical memory or performance degradation seen in pure transformer models.
- –Unsloth's UD (Dynamic) quants use importance-matrix weighting to preserve precision in critical layers, making the Q8 and Q4 versions highly competitive for complex agentic workflows.
- –The 27B parameter size is the "sweet spot" for 64GB systems, allowing for high context headroom (KV cache) even at high quantization levels.
- –M2 Ultra's 800 GB/s bandwidth remains the gold standard for local LLM performance, outclassing most standard PC setups for document processing and RAG.
DISCOVERED
54d ago
2026-04-03
PUBLISHED
54d ago
2026-04-03
RELEVANCE
AUTHOR
channingao