YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Xiaomi MiMo-V2.5-Pro-UltraSpeed tops 1,000 TPS

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Xiaomi MiMo-V2.5-Pro-UltraSpeed tops 1,000 TPS
OPEN LINK ↗
// 1h agoOPENSOURCE RELEASE

Xiaomi MiMo-V2.5-Pro-UltraSpeed tops 1,000 TPS

Xiaomi's MiMo team and TileRT have released MiMo-V2.5-Pro-UltraSpeed, achieving decoding speeds over 1,000 tokens per second on a trillion-parameter MoE model using a single 8-GPU node. This ultra-fast serving mode is enabled by DFlash speculative decoding and MXFP4 quantization, and the model checkpoints are open-sourced on Hugging Face.

// ANALYSIS

While Xiaomi's 1,000+ TPS on a trillion-parameter model is a stellar engineering feat, claims that this is the first useful speculative decoding method deployed on a quasi-frontier model are overstated.

* Speeds exceeding 1,000 TPS on commodity 8-GPU nodes demonstrate that massive models can be served efficiently without specialized supercomputers.

* DFlash's block-level masked parallel prediction offers a significant throughput improvement over traditional autoregressive draft-then-verify loops.

* Selective quantization (MXFP4 for experts, FP8 for attention) strikes a critical balance between reducing memory bandwidth bottlenecks and maintaining reasoning capabilities.

* The release of Hugging Face checkpoints invites open-source validation of these performance claims in real-world scenarios.

* Veteran AI practitioners note that multi-token prediction and speculative decoding have been utilized in production for nearly two years, making the "first useful" claim historically inaccurate.

// TAGS
speculative-decodingmoellm-inferencequantizationopen-sourcexiaomi

DISCOVERED

1h ago

2026-06-08

PUBLISHED

1h ago

2026-06-08

RELEVANCE

8/ 10

AUTHOR

jeremyphoward