Qwen3.5 quants show mixed Strix Halo results
A Reddit benchmark compares Qwen3.5-35B-A3B and 122B-A10B GGUF quants on AMD Strix Halo with llama.cpp, finding that newer ROCm builds improve throughput but Unsloth’s UD-XL dynamic quants still feel slower and less stable than a comparable Bartowski quant in real coding tasks. The post is less about raw model quality than the gap between quant benchmark claims and day-to-day local inference behavior.
This is the kind of benchmark AI developers actually care about: not just tokens per second, but whether a quant stays coherent under coding workloads. The big takeaway is that aggressive dynamic quantization can win on paper and still lose badly on usability.
- –The author reports clear ROCm speed gains moving from llama.cpp b8204 to b8248, while Vulkan improvements look much smaller
- –Unsloth’s own Qwen3.5 docs note that UD-XL variants are slower, and this user’s results reinforce that tradeoff on Strix Halo hardware
- –In a coding test, the reported UD-XL 122B run needed roughly 29.5K tokens to finish a single HTML task, versus about 18.7K for a Bartowski Q5_K_L quant with fewer corrections
- –The most interesting claim is logic drift, not speed: the post says the dynamic quants lose track in longer sessions and start proposing odd solutions other quants do not
- –It is still a single-user Reddit benchmark, but it is a useful warning that local LLM buyers should validate task stability, not just benchmark charts and compression ratios
DISCOVERED
32d ago
2026-03-10
PUBLISHED
33d ago
2026-03-09
RELEVANCE
AUTHOR
Educational_Sun_8813