MiniMax open-sources MSA sparse attention kernel

// 45d agoOPENSOURCE RELEASE

MiniMax open-sources MSA sparse attention kernel

MiniMax has open-sourced MiniMax Sparse Attention (MSA), a blockwise sparse attention kernel designed to handle million-token context windows efficiently. By combining a two-branch architecture with a co-designed GPU execution path, MSA reduces per-token compute by 28.4×, achieving a 14.2× prefill speedup and 7.6× decoding speedup on H800 GPUs.

// ANALYSIS

MiniMax's MSA demonstrates that combining algorithmic sparsity with hardware-level co-design is key to deploying million-token context models without prohibitive hardware costs. Decoupled indexing uses a lightweight branch to run low-dimensional projection and Top-k selection, avoiding the quadratic bottleneck of dense attention. Hardware co-design featuring exp-free Top-k selection and KV-outer sparse attention ensures high Tensor Core utilization. Finally, integration into the MiniMax-M3 multimodal model shows the kernel is battle-tested and ready for production scaling.

// TAGS

minimax-sparse-attentionsparse-attentionlong-contextcudallm-inferenceminimaxllm

DISCOVERED

45d ago

2026-06-14

PUBLISHED

45d ago

2026-06-14

RELEVANCE

8/ 10

AUTHOR

Github Awesome

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE43m ago

OpenAI resets Codex and ChatGPT Work usage limits

OpenAI reset usage limits for Codex and ChatGPT Work, extending GPT-5.6 Sol capacity by roughly 18%. The temporarily suspended five-hour limit returns tomorrow, though overall quality remains unaffected.

NEWS1h ago

AI coding agents drive 66% of docs traffic

Mintlify released its midyear 2026 documentation traffic report, showing AI agent activity surged to 66% of documentation web traffic in July with over 213 million requests logged. An internal benchmark across 20 documentation sites revealed that providing an llms.txt file reduced agent error rates by nearly 90%.

INFRA1h ago

Inception AI partners with Baseten on diffusion LLMs

Inception AI has announced a collaboration with Baseten to develop and deploy diffusion-based Large Language Models tailored for targeted AI workloads. Recognizing that applications such as real-time voice, coding sub-agents, and search pipelines demand distinct balances of intelligence, latency, and cost, Inception AI is leveraging diffusion LLM architectures on Baseten's inference infrastructure to deliver optimized performance beyond traditional autoregressive models.