Framework Desktop runs 122B long-context LLMs

// 77d agoBENCHMARK RESULT

Framework Desktop runs 122B long-context LLMs

A Reddit benchmark run on Framework Desktop with AMD's Ryzen AI Max+ 395 and 128GB of unified memory tests Qwen 3.5, GPT-OSS, and Qwen Coder Next across context windows up to 250K tokens. The standout result is not raw peak speed but that a compact desktop can still run heavily quantized 35B and 122B-class models locally at usable speeds far beyond the short-context benchmarks most hobbyist posts focus on.

// ANALYSIS

This is the kind of local AI benchmark developers actually need: long-context decay curves on real hardware instead of cherry-picked single-point token rates. It strengthens the case for Framework Desktop as one of the most interesting open local-LLM boxes, while also showing that software maturity and context length still dominate the experience once you move past headline specs.

–Qwen 3.5 35B A3B in Q6_K_L stays relatively strong, posting about 27.8 t/s at 100K context and 19.5 t/s at 250K, which is impressive for a single compact machine.
–The bigger 122B Qwen 3.5 variants remain technically usable but clearly hit the long-context wall, sliding from roughly 18-21 t/s near 5K context to around 8-10 t/s by 250K.
–GPT-OSS-20B and GPT-OSS-120B look especially practical on this hardware, suggesting Strix Halo is more than a curiosity for local inference workloads.
–Community testing around Framework Desktop has already shown backend and ROCm version choices can swing results dramatically, so these numbers are useful as a March 2026 snapshot rather than a final ceiling.
–Framework's own pitch is that the Desktop can run serious local models on-device; this post shows enthusiasts are already pushing that claim well past Llama-70B-style talking points into 100K+ context experiments.

// TAGS

framework-desktopllminferencebenchmarkgpu

DISCOVERED

77d ago

2026-03-11

PUBLISHED

78d ago

2026-03-10

RELEVANCE

8/ 10

AUTHOR

Anarchaotic

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS2h ago

Pangram flags Pope's encyclical as Claude-generated

Online sleuths claim Pope Leo's first encyclical, "Magnifica Humanitas," contains text generated by Claude. The Pangram AI detector flagged key paragraphs as 100% AI, supported by linguistic tells like excessive em-dashes and the word "genuinely."

MODEL3h ago

Prism ML launches Bonsai Image 4B variants

Prism ML has released Bonsai Image 4B, a compact text-to-image diffusion model family built from FLUX.2 Klein 4B for local inference on Apple Silicon and NVIDIA GPUs. The launch includes 1-bit and ternary variants, plus Bonsai Studio for trying the model on iPhone.

OPEN SOURCE3h ago

book-to-skill turns PDFs into Claude skills

book-to-skill converts technical PDFs and EPUBs into a reusable Claude Code skill with chapter files, a glossary, patterns, and a cheat sheet. The goal is to turn a book from something you read once into something an agent can query while you work.