Qwen3.5-35B-A3B tops 63 t/s on M2 Ultra

// 45d agoBENCHMARK RESULT

Qwen3.5-35B-A3B tops 63 t/s on M2 Ultra

Reddit benchmark on a Mac Studio M2 Ultra 64GB shows Qwen3.5-35B-A3B Q8_K_XL hitting 1,734 t/s prefill at 10,240 tokens, 1,552 t/s at 16,384 tokens, and 63 t/s generate, averaged over three runs. It is a narrow local-inference datapoint, but it suggests the model is very viable on high-memory Apple Silicon.

// ANALYSIS

This is a strong showing for a 35B-class MoE model on consumer-ish hardware, especially if you care about interactive local use more than leaderboard bragging rights.

–The active-parameter design lets a large model fit and run fast enough on 64GB unified memory without immediately collapsing into tiny quants.
–Prefill stays high even at 16K context, which matters more for long prompts and codebases than the raw generate number.
–Q8_K_XL looks like a sensible sweet spot here: enough fidelity to keep the model interesting, without the memory hit of heavier formats.
–Treat it as a hardware/backend benchmark, not a universal model ranking; no task suite, prompts, or quality scoring were reported.

// TAGS

llmbenchmarkinferenceopen-sourceqwen3-5-35b-a3b

DISCOVERED

45d ago

2026-04-17

PUBLISHED

45d ago

2026-04-17

RELEVANCE

8/ 10

AUTHOR

channingao

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE2h ago

ClawHub launches multi-layered ClawScan security scoring

ClawHub is adopting a multi-layered security scanning strategy to protect its AI agent skill registry, combining VirusTotal malware detection, static analysis, and NVIDIA SkillSpector. These layers are aggregated into a single ClawScan score to secure the ecosystem against risks like prompt injection, credential leaks, and malicious packages.

OPEN SOURCE2h ago

OpenClaw, NVIDIA release ClawHub security dataset

OpenClaw, in collaboration with NVIDIA, has open-sourced a Hugging Face dataset of security scans for 67,453 skills registered on its ClawHub marketplace. The release includes threat assessments and static/dynamic analyses to help the developer community establish robust guardrails against supply chain exploits.

UPDATE3h ago

Executor Announces Self-Hosted Cloud Version

Rhys Sullivan has announced the imminent release of a self-hosted cloud version of Executor, a local-first, sandboxed execution runtime designed as an integration and control plane for AI agents. Sullivan shared that prior architectural efforts to keep Executor's core database-agnostic and implement pluggable database adapters—while initially challenging—are now paying dividends, facilitating the rollout of the new self-hosted cloud platform.