llama.cpp lands MiMo-V2.5 text support
AesSedai’s PR brings Xiaomi’s MiMo-V2.5 into llama.cpp, starting with text inference support for the new 310B sparse-MoE model. It’s an early compatibility step for a model that promises 1M context and multimodal ambitions, even though audio, video, and full modality parity are still out of scope.
llama.cpp keeps turning “supported in llama.cpp” into the practical launch pad for new open-weight models. MiMo-V2.5 is a useful stress test for the runtime, but the current patch is mostly about making the text path usable before the rest of the stack catches up.
- –MiMo-V2.5 is enormous on paper, so the real value here is quantized/local experimentation, not casual laptop inference
- –Text-only support matters because it lets downstream GGUF builds and tooling move before full multimodal support lands
- –The model’s 1M context and sparse-MoE design make it interesting for long-horizon workflows, but also harder to serve cleanly
- –This is the kind of compatibility work that keeps llama.cpp relevant as the default runtime for new open-weight architectures
DISCOVERED
2h ago
2026-05-07
PUBLISHED
2h ago
2026-05-07
RELEVANCE
AUTHOR
jacek2023