MiniMax launches ultra-fast M3 model
MiniMax has announced MiniMax M3, a brand new model architecture featuring a 1-million-token context window, native video input support, and up to 15.6x faster decoding speeds. The model is priced disruptively at $0.30 per million input tokens and $1.20 per million output tokens, positioning it as a highly competitive and efficient multimodal option.
This is a massive shot across the bow for frontier LLM providers, proving that the race for long-context models is rapidly shifting from capability to pure, optimized inference speed at dirt-cheap prices. If these performance and speed claims hold up under real-world workloads, it will make long-context agentic workflows and real-time video analysis incredibly practical and affordable.
- –**Incredible Price-to-Performance Ratio**: At $0.30/1M input and $1.20/1M output, MiniMax M3 is aggressively priced, undercutting many existing long-context offerings.
- –**Architectural Breakthrough**: The claimed 15.6x faster decoding speed at a full 1M token context suggests an incredibly efficient implementation of sparse attention that solves key latency bottlenecks.
- –**Native Multimodality**: Native support for video inputs alongside large text contexts opens up powerful new opportunities for real-world video processing, summarization, and interactive agents.
- –**Pressure on Competitors**: A massive speed and cost disruption like this will force other model providers to prioritize inference optimization and pricing drops.
DISCOVERED
2h ago
2026-06-01
PUBLISHED
2h ago
2026-06-01
RELEVANCE
AUTHOR
bridgemindai