BACK_TO_FEEDAICRIER_2
Qwen 3.5 122B hits 120K context
OPEN_SOURCE ↗
REDDIT · REDDIT// 32d agoBENCHMARK RESULT

Qwen 3.5 122B hits 120K context

A LocalLLaMA user reports fitting a quantized Qwen 3.5 122B build into two AMD Mi50 GPUs and pushing context length to 120,000 tokens. The post claims roughly 136 tokens/sec prompt processing and 18 tokens/sec generation on ROCm, making it a notable community datapoint for long-context local inference on older AMD hardware.

// ANALYSIS

This is exactly the kind of benchmark that keeps local inference interesting: not a flashy new release, but proof that aggressive quantization and open-weight models keep stretching cheap secondhand hardware farther than expected.

  • The headline result is less about raw model quality than feasibility: 120K context on dual Mi50s is a strong signal for budget-minded local setups.
  • Prompt processing at ~136 t/s is solid for long-context experimentation, even if decode at ~18 t/s still limits interactive use.
  • The post reinforces how much mileage the open Qwen ecosystem, GGUF quantization, and llama.cpp-style tooling are getting out of non-NVIDIA hardware.
  • Because this is a single community benchmark, developers should treat it as a reproducibility lead, not a definitive performance baseline across workloads.
// TAGS
qwen-3.5llminferencebenchmarkopen-weights

DISCOVERED

32d ago

2026-03-10

PUBLISHED

36d ago

2026-03-06

RELEVANCE

8/ 10

AUTHOR

thejacer