OPEN_SOURCE ↗
REDDIT · REDDIT// 32d agoBENCHMARK RESULT
Qwen 3.5 122B hits 120K context
A LocalLLaMA user reports fitting a quantized Qwen 3.5 122B build into two AMD Mi50 GPUs and pushing context length to 120,000 tokens. The post claims roughly 136 tokens/sec prompt processing and 18 tokens/sec generation on ROCm, making it a notable community datapoint for long-context local inference on older AMD hardware.
// ANALYSIS
This is exactly the kind of benchmark that keeps local inference interesting: not a flashy new release, but proof that aggressive quantization and open-weight models keep stretching cheap secondhand hardware farther than expected.
- –The headline result is less about raw model quality than feasibility: 120K context on dual Mi50s is a strong signal for budget-minded local setups.
- –Prompt processing at ~136 t/s is solid for long-context experimentation, even if decode at ~18 t/s still limits interactive use.
- –The post reinforces how much mileage the open Qwen ecosystem, GGUF quantization, and llama.cpp-style tooling are getting out of non-NVIDIA hardware.
- –Because this is a single community benchmark, developers should treat it as a reproducibility lead, not a definitive performance baseline across workloads.
// TAGS
qwen-3.5llminferencebenchmarkopen-weights
DISCOVERED
32d ago
2026-03-10
PUBLISHED
36d ago
2026-03-06
RELEVANCE
8/ 10
AUTHOR
thejacer