BACK_TO_FEEDAICRIER_2
Qwen3.5-27B hits 311 tokens/second prefill on M2 Ultra
OPEN_SOURCE ↗
REDDIT · REDDIT// 9d agoBENCHMARK RESULT

Qwen3.5-27B hits 311 tokens/second prefill on M2 Ultra

New performance benchmarks for Qwen3.5-27B using Unsloth's Dynamic (UD) quants show exceptional prefill performance on Apple Silicon hardware. Running on a Mac Studio M2 Ultra with 64GB of unified memory, the dense hybrid model achieved over 311 tokens/second prefill speed using Q8 quantization, demonstrating that high-precision local inference is increasingly viable for large-scale context windows on consumer-pro hardware.

// ANALYSIS

Qwen3.5-27B is proving to be a top-tier "dense" alternative to MoE models, offering superior consistency and reasoning density for local deployment.

  • The hybrid Gated DeltaNet architecture enables massive 262K context scaling without the typical memory or performance degradation seen in pure transformer models.
  • Unsloth's UD (Dynamic) quants use importance-matrix weighting to preserve precision in critical layers, making the Q8 and Q4 versions highly competitive for complex agentic workflows.
  • The 27B parameter size is the "sweet spot" for 64GB systems, allowing for high context headroom (KV cache) even at high quantization levels.
  • M2 Ultra's 800 GB/s bandwidth remains the gold standard for local LLM performance, outclassing most standard PC setups for document processing and RAG.
// TAGS
qwen3.5-27bllmlocal-llmbenchmarkunslothopen-weightsinference

DISCOVERED

9d ago

2026-04-03

PUBLISHED

9d ago

2026-04-03

RELEVANCE

8/ 10

AUTHOR

channingao