OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoPRODUCT UPDATE
APEX widens MoE quants, adds I-Nano
APEX, the MoE-aware mixed-precision quantization project, has expanded from one Qwen 3.5 showcase to 30+ models across Qwen, MiniMax, Mistral, Nemotron, Gemma, and community merges. The new I-Nano tier pushes compression further for sparse MoEs by cutting expert precision aggressively while keeping shared layers higher precision.
// ANALYSIS
APEX is starting to look like a real quantization strategy for MoE models, not just a one-off experiment on Qwen 3.5. The interesting part is less the model count than the repeated claim that routing-aware precision preserves long-context and code quality better than uniform quants.
- –The expansion across major MoE families suggests the approach is generalizing beyond one architecture
- –I-Nano is only plausible on MoEs because sparse expert activation can absorb extreme compression better than dense models
- –The reported memory savings are real but uneven; denser shared experts reduce the win, so architecture still matters
- –The strongest value proposition is practical deployment: fitting 30-70B-class MoEs onto a single consumer GPU without falling back to blunt uniform quantization
- –The evidence is promising, but still mostly self-reported and repo-owned benchmarks, so independent replication will matter
// TAGS
quantizationmoelong-contextllmopen-sourceinferenceapex-quant
DISCOVERED
4h ago
2026-05-04
PUBLISHED
7h ago
2026-05-04
RELEVANCE
9/ 10
AUTHOR
mudler_it