Robin details LongCat-2.0 training on Ascend
Zhihu contributor Robin shared a technical analysis of Meituan’s 1.6-trillion-parameter LongCat-2.0 MoE model, detailing its training on a 50,000-card Huawei Ascend cluster. The write-up highlights how custom sparse attention and Mixture-of-Experts enabled training and inference of the model on domestic Chinese hardware.
Pre-training frontier-class models is no longer exclusive to NVIDIA-based infrastructure, as Meituan's adaptation to Huawei Ascend chips proves that domestic compute ecosystems can support trillion-parameter scaling through targeted architectural design.
* The utilization of Mixture-of-Experts (MoE) allows a 1.6-trillion-parameter model to only activate 48 billion parameters per token, making training computationally manageable.
* Custom sparse attention mechanisms (such as LongCat Sparse Attention) are key to enabling a native 1-million-token context window without exceeding memory budgets.
* Successfully scaling training across 50,000 domestic AI ASICs showcases the growing capability and maturity of domestic hardware compilers and interconnects.
DISCOVERED
2h ago
2026-07-01
PUBLISHED
2h ago
2026-07-01
RELEVANCE
AUTHOR
ZhihuFrontier