Qwen 3.5 122B hits 120K context

// 79d agoBENCHMARK RESULT

Qwen 3.5 122B hits 120K context

A LocalLLaMA user reports fitting a quantized Qwen 3.5 122B build into two AMD Mi50 GPUs and pushing context length to 120,000 tokens. The post claims roughly 136 tokens/sec prompt processing and 18 tokens/sec generation on ROCm, making it a notable community datapoint for long-context local inference on older AMD hardware.

// ANALYSIS

This is exactly the kind of benchmark that keeps local inference interesting: not a flashy new release, but proof that aggressive quantization and open-weight models keep stretching cheap secondhand hardware farther than expected.

–The headline result is less about raw model quality than feasibility: 120K context on dual Mi50s is a strong signal for budget-minded local setups.
–Prompt processing at ~136 t/s is solid for long-context experimentation, even if decode at ~18 t/s still limits interactive use.
–The post reinforces how much mileage the open Qwen ecosystem, GGUF quantization, and llama.cpp-style tooling are getting out of non-NVIDIA hardware.
–Because this is a single community benchmark, developers should treat it as a reproducibility lead, not a definitive performance baseline across workloads.

// TAGS

qwen-3.5llminferencebenchmarkopen-weights

DISCOVERED

79d ago

2026-03-10

PUBLISHED

83d ago

2026-03-06

RELEVANCE

8/ 10

AUTHOR

thejacer

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

VIDEO2h ago

Viral video teases Claude Opus 4.8

A viral video directed by Miguel07Code showcases impressive "hyperframes" camera movements, allegedly generated by Claude Opus 4.8. The post has sparked speculation about Claude's video generation capabilities.

LAUNCH2h ago

Browser Use Terminal launches Rust web-agent TUI

Browser Use Terminal is a new Rust-based TUI that lets developers automate and steer browser tasks directly from the command line. It combines a lightweight LLM harness with direct CDP control over Chrome for highly observable, interactive automation.

NEWS2h ago

Developer automates BTC trading with Claude, nets profit

A developer tasked Claude with a $20 budget to autonomously trade Bitcoin overnight, resulting in a completed script that successfully executed five trades for a $95 profit. The experiment showcases the increasing capability of LLMs to generate functional, profitable algorithmic trading systems with minimal oversight.