BACK_TO_FEEDAICRIER_2
Qwen3.5 quants show mixed Strix Halo results
OPEN_SOURCE ↗
REDDIT · REDDIT// 32d agoBENCHMARK RESULT

Qwen3.5 quants show mixed Strix Halo results

A Reddit benchmark compares Qwen3.5-35B-A3B and 122B-A10B GGUF quants on AMD Strix Halo with llama.cpp, finding that newer ROCm builds improve throughput but Unsloth’s UD-XL dynamic quants still feel slower and less stable than a comparable Bartowski quant in real coding tasks. The post is less about raw model quality than the gap between quant benchmark claims and day-to-day local inference behavior.

// ANALYSIS

This is the kind of benchmark AI developers actually care about: not just tokens per second, but whether a quant stays coherent under coding workloads. The big takeaway is that aggressive dynamic quantization can win on paper and still lose badly on usability.

  • The author reports clear ROCm speed gains moving from llama.cpp b8204 to b8248, while Vulkan improvements look much smaller
  • Unsloth’s own Qwen3.5 docs note that UD-XL variants are slower, and this user’s results reinforce that tradeoff on Strix Halo hardware
  • In a coding test, the reported UD-XL 122B run needed roughly 29.5K tokens to finish a single HTML task, versus about 18.7K for a Bartowski Q5_K_L quant with fewer corrections
  • The most interesting claim is logic drift, not speed: the post says the dynamic quants lose track in longer sessions and start proposing odd solutions other quants do not
  • It is still a single-user Reddit benchmark, but it is a useful warning that local LLM buyers should validate task stability, not just benchmark charts and compression ratios
// TAGS
qwen3-5llmbenchmarkinferenceopen-weights

DISCOVERED

32d ago

2026-03-10

PUBLISHED

33d ago

2026-03-09

RELEVANCE

8/ 10

AUTHOR

Educational_Sun_8813