BACK_TO_FEEDAICRIER_2
Qwen 3.5 27B tops Gemma 4 on MI50
OPEN_SOURCE ↗
REDDIT · REDDIT// 6d agoBENCHMARK RESULT

Qwen 3.5 27B tops Gemma 4 on MI50

New benchmarks on dual AMD MI50 GPUs reveal Qwen 3.5 27B’s significant throughput advantage, reaching 39.08 tok/s compared to Gemma 4 31B’s 18.77 tok/s. Utilizing a specialized vLLM fork, the results highlight the effectiveness of Qwen’s 5-token speculative decoding for local inference on legacy hardware.

// ANALYSIS

Qwen 3.5’s Multi-Token Prediction (MTP) architecture is a game-changer for throughput, though the "reasoning tax" remains a factor.

  • Qwen 3.5 27B achieves nearly double the output throughput of Gemma 4 31B, bolstered by a 5-token speculative decoding step with an 89.7% acceptance rate.
  • Gemma 4 31B shows substantially higher Time to First Token (43s vs 25s), suggesting less efficient prompt processing on the specific vllm-gfx906-mobydick stack.
  • The specialized "mobydick" fork is the key enabler for legacy gfx906 cards, using hybrid FP32/FP16 activations to bypass the lack of native BF16 support.
  • Despite the speed, Qwen’s integrated reasoning can lead to high token counts per request; for simple tasks, Gemma’s lower verbosity may still result in faster wall-clock completion.
// TAGS
qwen-3.5gemma-4llmbenchmarkinferencegpuopen-weights

DISCOVERED

6d ago

2026-04-06

PUBLISHED

6d ago

2026-04-05

RELEVANCE

8/ 10

AUTHOR

ai-infos