BACK_TO_FEEDAICRIER_2
GGUF Quants Arena Tops MMLU Subset Scores
OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoBENCHMARK RESULT

GGUF Quants Arena Tops MMLU Subset Scores

This Reddit post shares MMLU subset results for several GGUF quantizations under a fixed llama.cpp setup: 8192 context, seed 42, and flash attention on. The best scores are tightly clustered at the top, with Qwen3.5-27B variants leading and only small gaps separating adjacent quants.

// ANALYSIS

The main signal here is that quantization choice matters, but not always in the way people expect: the highest-scoring Q5 quant barely edges the Q4 quant, and a few alternative 27B variants stay close enough to make deployment tradeoffs more about memory and speed than raw accuracy.

  • Qwen3.5-27B-UD-Q5_K_XL and Q4_K_XL are effectively tied, so the cheaper quant may be the better default if runtime constraints matter
  • The distilled 27B reasoning model lands close behind, suggesting distillation can preserve benchmark strength surprisingly well
  • Smaller and more aggressive quants fall off steadily, which is consistent with MMLU punishing over-compression more than casual chat workloads do
  • The post is most useful as a practical llama.cpp tuning reference, not as a model-launch announcement
  • The single error noted on gemma-4-31B-it hints that benchmark hygiene still matters as much as the score table itself
// TAGS
llmbenchmarkinferencegpugguf-quants-arena

DISCOVERED

3h ago

2026-04-17

PUBLISHED

18h ago

2026-04-16

RELEVANCE

8/ 10

AUTHOR

qwen_next_gguf_when