BACK_TO_FEEDAICRIER_2
Qwen 3.5 MoE tops Gemma 4 M5 benchmarks
OPEN_SOURCE ↗
REDDIT · REDDIT// 9d agoBENCHMARK RESULT

Qwen 3.5 MoE tops Gemma 4 M5 benchmarks

Performance benchmarks on the MacBook M5 (128GB RAM) utilizing the oMLX framework demonstrate that Qwen 3.5 MoE remains the throughput leader for local agentic workloads, despite Gemma 4's gains in responsiveness. The results highlight the M5's new Neural Accelerator, which provides up to 4x faster prompt processing, and the efficacy of oMLX’s tiered KV caching in reducing latency for long-context multi-turn interactions.

// ANALYSIS

The M5 Max and oMLX are turning local Macs into viable high-performance inference servers, with MoE architectures clearly winning on Apple Silicon.

  • Qwen 3.5 MoE (35B-A3B) is the current performance champion, achieving 92.2 tok/s for generation and nearly 2,850 tok/s for prompt processing.
  • oMLX's tiered KV caching leverages SSD storage to restore context prefixes in under 2 seconds, a massive improvement over the 60+ second prefill times seen in standard MLX implementations.
  • The M5's Neural Accelerator specifically boosts the prefill stage, making dense models more responsive but not yet competitive with MoE throughput.
  • While Gemma 4 is more memory-efficient and responsive for "edge" tasks, it lags behind Qwen in sustained batching and serving performance for heavy developer workloads.
  • SSD-based context persistence is becoming the new baseline for "agentic" local LLM tools like Claude Code and Cursor.
// TAGS
omlxmlxapple-siliconllmbenchmarkgemma-4qwen3-5edge-ai

DISCOVERED

9d ago

2026-04-03

PUBLISHED

9d ago

2026-04-02

RELEVANCE

8/ 10

AUTHOR

onil_gova