REDDIT · REDDIT// 3h agoBENCHMARK RESULT

GGUF Quants Arena Tops MMLU Subset Scores

This Reddit post shares MMLU subset results for several GGUF quantizations under a fixed llama.cpp setup: 8192 context, seed 42, and flash attention on. The best scores are tightly clustered at the top, with Qwen3.5-27B variants leading and only small gaps separating adjacent quants.

// ANALYSIS

The main signal here is that quantization choice matters, but not always in the way people expect: the highest-scoring Q5 quant barely edges the Q4 quant, and a few alternative 27B variants stay close enough to make deployment tradeoffs more about memory and speed than raw accuracy.

–Qwen3.5-27B-UD-Q5_K_XL and Q4_K_XL are effectively tied, so the cheaper quant may be the better default if runtime constraints matter
–The distilled 27B reasoning model lands close behind, suggesting distillation can preserve benchmark strength surprisingly well
–Smaller and more aggressive quants fall off steadily, which is consistent with MMLU punishing over-compression more than casual chat workloads do
–The post is most useful as a practical llama.cpp tuning reference, not as a model-launch announcement
–The single error noted on gemma-4-31B-it hints that benchmark hygiene still matters as much as the score table itself

// TAGS

llmbenchmarkinferencegpugguf-quants-arena

DISCOVERED

3h ago

2026-04-17

PUBLISHED

18h ago

2026-04-16

RELEVANCE

8/ 10

AUTHOR

qwen_next_gguf_when