Qwen3.5-27B hits 311 tokens/second prefill on M2 Ultra

// 55d agoBENCHMARK RESULT

Qwen3.5-27B hits 311 tokens/second prefill on M2 Ultra

New performance benchmarks for Qwen3.5-27B using Unsloth's Dynamic (UD) quants show exceptional prefill performance on Apple Silicon hardware. Running on a Mac Studio M2 Ultra with 64GB of unified memory, the dense hybrid model achieved over 311 tokens/second prefill speed using Q8 quantization, demonstrating that high-precision local inference is increasingly viable for large-scale context windows on consumer-pro hardware.

// ANALYSIS

Qwen3.5-27B is proving to be a top-tier "dense" alternative to MoE models, offering superior consistency and reasoning density for local deployment.

–The hybrid Gated DeltaNet architecture enables massive 262K context scaling without the typical memory or performance degradation seen in pure transformer models.
–Unsloth's UD (Dynamic) quants use importance-matrix weighting to preserve precision in critical layers, making the Q8 and Q4 versions highly competitive for complex agentic workflows.
–The 27B parameter size is the "sweet spot" for 64GB systems, allowing for high context headroom (KV cache) even at high quantization levels.
–M2 Ultra's 800 GB/s bandwidth remains the gold standard for local LLM performance, outclassing most standard PC setups for document processing and RAG.

// TAGS

qwen3.5-27bllmlocal-llmbenchmarkunslothopen-weightsinference

DISCOVERED

55d ago

2026-04-03

PUBLISHED

55d ago

2026-04-03

RELEVANCE

8/ 10

AUTHOR

channingao

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS2h ago

Dev lets Claude trade BTC overnight, nets $95 profit

A developer gave Claude a $20 budget to autonomously script and execute Bitcoin trades overnight, waking up to a functional trading bot and a $95 profit across five trades.

OPEN SOURCE2h ago

Plannotator 0.19.24 adds Amp support and configurable storage

Plannotator 0.19.24 is a substantial release that expands the tool beyond Claude Code with native Amp support, adds a `PLANNOTATOR_DATA_DIR` override so users can move the default `~/.plannotator` data directory, introduces Auto Mode in the permission selector for newer Claude Code versions, and fixes a Pi approval crash after plan acceptance. The update folds multiple stacked PRs into one release and pushes the project further toward a multi-agent review layer rather than a single-agent hook utility.

NEWS3h ago

Aaronson says AI turns mathematicians into curators

Scott Aaronson says recent AI results in mathematics, including a GPT-5.5 Pro solution to Erdős’s Unit Distance Problem, suggest humans may increasingly focus on choosing questions and interpreting model outputs. He extends the argument to AI-written fiction and the Vatican’s AI encyclical as signs of a broader cultural shift.