Qwen3.5-122B-A10B hits 30 tps on M1 Ultra

// 69d agoBENCHMARK RESULT

Qwen3.5-122B-A10B hits 30 tps on M1 Ultra

A Reddit user ran Qwen3.5-122B-A10B in Unsloth Studio with a llama.cpp backend on an M1 Ultra, feeding it a 22K-token TurboQuant paper. They reported 396 tps prompt processing and 30.5 tps token generation, which is a useful signal that this giant MoE model is locally usable on Apple Silicon with the right quantization.

// ANALYSIS

This reads less like a one-off curiosity and more like evidence that Qwen3.5’s 10B-active MoE design can make very large models practical on consumer Macs. The real story is the decode speed: 30.5 tps is not “desktop toy” territory for a 122B model.

–The 396 tps prompt speed suggests prefill is very healthy, even with a 22K-token context, so long-context workflows may be more usable than people expect.
–The 30.5 tps generation rate is the number that matters for interactive use, and it’s respectable for a local 122B-class model on Apple Silicon.
–llama.cpp via Unsloth Studio is a solid baseline, but MLX is the obvious next comparison point for Mac owners chasing more throughput.
–This is still a single user report, so quant choice, sampler settings, and context length could move the numbers a lot.
–The broader takeaway: Qwen3.5-122B-A10B is no longer automatically “too big for Mac,” but it remains highly setup-sensitive.

// TAGS

qwen3-5-122b-a10bllmbenchmarkinferenceself-hostedopen-weights

DISCOVERED

69d ago

2026-04-01

PUBLISHED

69d ago

2026-04-01

RELEVANCE

8/ 10

AUTHOR

One_Key_8127

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS12m ago

Claude Fable 5 tops 5.5 in data analysis

In a recent post on X, user Theo expressed intense enthusiasm about the data analysis capabilities of an AI model called Fable. By stating it is "WAY better than 5.5," the user implies a significant generational leap in performance over what is likely a major foundational model, suggesting Fable is exceptionally well-suited for complex data tasks.

MODEL44m ago

Claude Fable 5 launch sparks massive developer backlash

Anthropic's Claude Fable 5 launch faces severe developer backlash over aggressive safety restrictions, high pricing, and a forced 30-day data retention policy. The model silently routes chemistry, biology, and cybersecurity requests to the older Opus 4.8 model, frustrating users with opaque downgrades and anti-distillation blocks.

MODEL44m ago

Designers praise Claude Fable 5 landing pages

Educator and designer Meng To highlighted Claude Fable 5's capability for creating landing pages on X, calling the model "a monster" for the task. Released in June 2026, Claude Fable 5 is Anthropic's latest Mythos-class AI model, featuring a 1-million-token context window, a 128,000-token output capacity, and advanced reasoning for long-horizon agentic workflows, making it highly effective for complex design and front-end code generation tasks.