Robin details LongCat-2.0 training on Ascend

// 2h agoMODEL RELEASE

Robin details LongCat-2.0 training on Ascend

Zhihu contributor Robin shared a technical analysis of Meituan’s 1.6-trillion-parameter LongCat-2.0 MoE model, detailing its training on a 50,000-card Huawei Ascend cluster. The write-up highlights how custom sparse attention and Mixture-of-Experts enabled training and inference of the model on domestic Chinese hardware.

// ANALYSIS

Pre-training frontier-class models is no longer exclusive to NVIDIA-based infrastructure, as Meituan's adaptation to Huawei Ascend chips proves that domestic compute ecosystems can support trillion-parameter scaling through targeted architectural design.

* The utilization of Mixture-of-Experts (MoE) allows a 1.6-trillion-parameter model to only activate 48 billion parameters per token, making training computationally manageable.

* Custom sparse attention mechanisms (such as LongCat Sparse Attention) are key to enabling a native 1-million-token context window without exceeding memory budgets.

* Successfully scaling training across 50,000 domestic AI ASICs showcases the growing capability and maturity of domestic hardware compilers and interconnects.

// TAGS

longcat-2.0meituanmoedomestic-computeopen-sourceascend-910llm

DISCOVERED

2h ago

2026-07-01

PUBLISHED

2h ago

2026-07-01

RELEVANCE

8/ 10

AUTHOR

ZhihuFrontier

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

BENCHMARK47m ago

OpenAI releases GeneBench-Pro biology benchmark

OpenAI has released GeneBench-Pro, a 129-problem benchmark designed to evaluate AI models on complex, noisy computational biology tasks. In initial testing, GPT-5.6 Sol achieved a 31.5% pass rate in Pro mode, highlighting progress in scientific reasoning while showing that expert autonomy remains in its early stages.

INFRA1h ago

Canopy launches AI-native appchain framework

Canopy is an AI-native Web3 infrastructure framework designed to simplify the creation and deployment of application-specific blockchains from natural language descriptions. Acting as a "Replit for Web3," the platform uses a Nested Chain model to inherit validator security, lowering the barriers to launching sovereign networks.

MODEL3h ago

NVIDIA releases Qwen3.6 Blackwell checkpoint

NVIDIA has released an NVFP4-quantized checkpoint for Alibaba's dense open-weight model Qwen3.6-27B, optimized for its Blackwell architecture. By packaging the weights as hardware-native inference objects, the release significantly reduces memory footprint while simplifying deployment on vLLM and SGLang.

Robin details LongCat-2.0 training on Ascend