Qwen3.5-35B Matches Q4 on MI50s

// 97d agoBENCHMARK RESULT

Qwen3.5-35B Matches Q4 on MI50s

On dual AMD MI50s, a community benchmark says Qwen3.5-35B-A3B at Q8_0 hits 55 tok/s generation and 1100 tok/s prefill, nearly matching a Q4_K_XL run. The post suggests older AMD hardware and software overhead are flattening the expected speedup from heavier quantization.

// ANALYSIS

This reads less like a surprise model win and more like a reminder that local inference performance is often limited by kernels, memory movement, and device topology, not just bit width.

–Q8_0 keeping pace with Q4_K_XL on generation suggests the bottleneck is not purely arithmetic throughput.
–The prefill jump on two GPUs shows where parallelism still matters: prompt processing benefits far more than token-by-token decoding.
–MI50-era AMD cards are exactly where inference stacks tend to be least polished, so quantization gains can get swallowed by software inefficiency.
–For local model runners, this is a useful reminder to benchmark `prefill` and `decode` separately before choosing a quant level.
–Qwen3.5-35B-A3B still looks attractive for multi-GPU local deployment, especially if you can afford a higher-quality quant without losing real-world speed.

// TAGS

qwen3.5-35b-a3bllmbenchmarkgpuinferenceopen-source

DISCOVERED

97d ago

2026-04-06

PUBLISHED

97d ago

2026-04-06

RELEVANCE

8/ 10

AUTHOR

Far-Low-4705

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE12m ago

Win11Debloat declutters Windows 10 and 11

Win11Debloat is a lightweight, customizable PowerShell script to declutter, optimize, and customize Windows 10 and 11. It allows users to remove pre-installed bloatware apps, disable telemetry, adjust privacy settings, and tweak user interface elements through an interactive menu or command-line arguments.

LAUNCH29m ago

Odingard launches Cerberus runtime security engine

Cerberus by Odingard Security is a runtime security engine for AI agents that mitigates security risks by intercepting tool calls at the tool boundary. It specifically protects production systems against the "Lethal Trifecta"—the convergence of sensitive data access, untrusted content processing, and outbound communication channels.

RESEARCH38m ago

Smart Cellular Bricks achieve decentralized self-repair

A new Nature Communications paper by researchers from the IT University of Copenhagen, Sakana AI, and Autodesk introduces Smart Cellular Bricks, a modular 3D system capable of shape classification and self-repair. Running a decentralized Neural Cellular Automata model, the individual bricks communicate only with immediate neighbors to collectively coordinate recovery without a central controller.