YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Framework Desktop runs 122B long-context LLMs

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Framework Desktop runs 122B long-context LLMs
OPEN LINK ↗
// 77d agoBENCHMARK RESULT

Framework Desktop runs 122B long-context LLMs

A Reddit benchmark run on Framework Desktop with AMD's Ryzen AI Max+ 395 and 128GB of unified memory tests Qwen 3.5, GPT-OSS, and Qwen Coder Next across context windows up to 250K tokens. The standout result is not raw peak speed but that a compact desktop can still run heavily quantized 35B and 122B-class models locally at usable speeds far beyond the short-context benchmarks most hobbyist posts focus on.

// ANALYSIS

This is the kind of local AI benchmark developers actually need: long-context decay curves on real hardware instead of cherry-picked single-point token rates. It strengthens the case for Framework Desktop as one of the most interesting open local-LLM boxes, while also showing that software maturity and context length still dominate the experience once you move past headline specs.

  • Qwen 3.5 35B A3B in Q6_K_L stays relatively strong, posting about 27.8 t/s at 100K context and 19.5 t/s at 250K, which is impressive for a single compact machine.
  • The bigger 122B Qwen 3.5 variants remain technically usable but clearly hit the long-context wall, sliding from roughly 18-21 t/s near 5K context to around 8-10 t/s by 250K.
  • GPT-OSS-20B and GPT-OSS-120B look especially practical on this hardware, suggesting Strix Halo is more than a curiosity for local inference workloads.
  • Community testing around Framework Desktop has already shown backend and ROCm version choices can swing results dramatically, so these numbers are useful as a March 2026 snapshot rather than a final ceiling.
  • Framework's own pitch is that the Desktop can run serious local models on-device; this post shows enthusiasts are already pushing that claim well past Llama-70B-style talking points into 100K+ context experiments.
// TAGS
framework-desktopllminferencebenchmarkgpu

DISCOVERED

77d ago

2026-03-11

PUBLISHED

78d ago

2026-03-10

RELEVANCE

8/ 10

AUTHOR

Anarchaotic