Qwen3.5-122B hits performance ceiling on Apple Silicon

// 90d agoMODEL RELEASE

Qwen3.5-122B hits performance ceiling on Apple Silicon

A LocalLLaMA user reports consistent 10 tok/s performance for the Qwen3.5-122B-A10B MoE model on high-end M4 Max and M1 Ultra hardware. Despite exhaustive configuration tweaks in llama.cpp, memory bandwidth remains the primary bottleneck for this 122B parameter model.

// ANALYSIS

Qwen3.5-122B-A10B is the new heavyweight champion for local inference, but it demands specific software stacks to shine.

–10 tok/s on llama.cpp is the expected floor for a model of this scale; MLX is required to hit the 40+ tok/s ceiling on M4 Max.
–Performance degradation at 50k+ context points to KV cache overhead and memory pressure, common in MoE models with large context windows.
–128GB of Unified Memory is the minimum requirement for 4-bit quants; any higher precision or context quickly triggers memory swap.
–Users seeking interactive coding speeds should pivot to the 27B dense variant or prioritize the MLX framework over traditional GGUF backends.

// TAGS

qwen3.5-122b-a10bllminferencebenchmarkopen-weights

DISCOVERED

90d ago

2026-04-15

PUBLISHED

90d ago

2026-04-14

RELEVANCE

8/ 10

AUTHOR

lots_of_apples

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE18m ago

B.AI brings GPT-5.6 to web chat

B.AI has launched the OpenAI GPT-5.6 model suite directly on its web chat interface, allowing users to run the Sol, Terra, and Luna models instantly from the browser. This integration enables developers and users to leverage advanced reasoning and coding capabilities without needing API keys or complex setups.

UPDATE37m ago

Lightpanda adds HTTP MCP multi-session support

Lightpanda, a Zig-based headless browser, has introduced Model Context Protocol (MCP) support over HTTP and multi-session capability to enable parallel execution of AI agents. Each connection is routed to an isolated browsing session via session ID headers, optimized through V8 isolate parking.

NEWS1h ago

AI market shifts from benchmarks to utility

In the early stages of the AI boom, market dynamics were defined by a straightforward race to build the smartest model with the highest benchmark scores. However, as the ecosystem matures, raw computational power and peak capabilities are no longer the sole measures of success, meaning the most powerful AI models may not necessarily become the most important or widely adopted.