Qwen3.6 MTP tops 249 t/s on RTX 5090M

// 62d agoBENCHMARK RESULT

Qwen3.6 MTP tops 249 t/s on RTX 5090M

A Reddit benchmark claims the unsloth Qwen3.6-35B-A3B-MTP-GGUF UD-Q3_K_XL quant reaches 249.30 tokens/s on a laptop-class RTX 5090M with llama.cpp master, draft MTP, and spec-draft-n-max 3. The author compares it with the dense 27B variant on the same hardware, reports 74.28 tokens/s for the dense model, and includes a context-length sweep plus VRAM estimates up to 262K.

// ANALYSIS

This is mainly a benchmark result rather than a model launch. The throughput gap is plausibly driven by MoE sparsity plus speculative decoding, and the reported 86.6% acceptance at n_max=3 is the key driver behind the throughput claim. The context sweep suggests the stack stays stable through 262K context with only a small drop at full native length.

// TAGS

qwenqwen36mtpmoellamacppquantizationbenchmarklocal-firstinferencenvidia

DISCOVERED

62d ago

2026-05-23

PUBLISHED

62d ago

2026-05-23

RELEVANCE

8/ 10

AUTHOR

aurelienams

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

VIDEO3h ago

Wails offers lightweight Go alternative to Electron

Wails is an open-source framework for building cross-platform desktop applications combining Go backends with native operating system webviews. By leveraging native web engines instead of bundling full browser runtimes, Wails delivers smaller binary sizes, low memory usage, and automatic TypeScript bindings.

NEWS4h ago

Google Considers Restricting On-Device ADB Loopback Connections

A discussion on Google IssueTracker reveals that Android maintainers are considering restricting on-device Android Debug Bridge (ADB) connections over local loopback (127.0.0.1) to address security vulnerabilities like CVE-2026-0073. App developer Kitsumed highlights how blocking local loopback access could dismantle a vibrant ecosystem of rootless power-user tools like Shizuku and Termux, urging constructive feedback on IssueTracker.

OPEN SOURCE7h ago

Cloudflare open-sources Nimbus Astro docs framework

Nimbus is an open-source documentation framework built on Astro by Cloudflare to make documentation accessible to both human developers and AI agents. It scaffolds customizable documentation sites directly into project repositories with native support for llms.txt, markdown variants, and an expandable component registry.