OPEN_SOURCE ↗
REDDIT · REDDIT// 1h agoRESEARCH PAPER
MoE models top Dense with 7x leverage
Ant Group researchers introduce Efficiency Leverage (EL), a new metric proving that MoE models like Ling-mini-beta (0.85B active) match 6.1B dense models with 7x less compute. The study establishes unified scaling laws showing that MoE's efficiency advantage actually increases as training compute scales.
// ANALYSIS
MoE isn't just about parameter count; it's a fundamental compute leverage play that gets stronger as models grow.
- –Efficiency Leverage (EL) quantified as a predictable power law driven by expert activation and compute budget.
- –Empirical testing on 1T tokens shows 0.85B active MoE parameters matching 6.1B dense parameters.
- –Optimal expert granularity identified at 8-12 experts, providing a blueprint for future model architectures.
- –Data hunger remains the "tax" for MoE, requiring more tokens than dense counterparts for optimal compute efficiency.
- –Unified scaling law suggests we haven't hit the ceiling on MoE efficiency gains yet.
// TAGS
llmmo-eresearchscaling-lawsant-groupling-mini-beta
DISCOVERED
1h ago
2026-04-28
PUBLISHED
2h ago
2026-04-28
RELEVANCE
10/ 10
AUTHOR
Different_Fix_2217