Qwen3.6-27B Tests Strix Halo 128GB Limits
The post is a request for real-world experience running Qwen3.6-27B on Strix Halo systems with 128GB of memory, especially under very long context lengths near 256K. The author is looking for practical throughput, memory pressure, and usability reports rather than benchmark claims, and notes they would otherwise test on Runpod if the hardware were available there.
Strong signal that this model is interesting specifically because it sits in the local-self-hosting sweet spot, but the real question is whether long-context usage is practical on consumer hardware.
- –The model’s appeal is density: a 27B dense checkpoint is small enough to be locally relevant, but still capable enough to attract serious workloads.
- –The hard part is not just loading weights; 256K context pushes KV cache and memory bandwidth, which is where Strix Halo users will care most.
- –This is less about raw benchmark bragging and more about sustained interactive performance under long prompts, tool use, and iterative coding.
- –The discussion suggests buyers want evidence from actual owners before committing time or money to a platform-specific setup.
- –Likely outcome: workable for shorter or moderate contexts, but 256K on 128GB will depend heavily on quantization, runtime, and how much headroom the rest of the system leaves.
DISCOVERED
45d ago
2026-04-27
PUBLISHED
45d ago
2026-04-27
RELEVANCE
AUTHOR
boutell