OPEN_SOURCE ↗
REDDIT · REDDIT// 25d agoMODEL RELEASE
Mamba-3 debuts as inference-first SSM
Together AI and collaborators released Mamba-3, a new state space model that flips the Mamba family from training-first design toward inference efficiency. The release pairs a paper, benchmark gains, and open-sourced kernels aimed at making linear models more practical at decode time.
// ANALYSIS
This is a credible push to make state space models matter for deployment, not just training curves. Mamba-3 reads like the first serious attempt to optimize a linear architecture around real-world inference bottlenecks instead of treating them as an afterthought.
- –The core changes are meaningful: more expressive recurrence, complex-valued state tracking, and a MIMO variant that improves quality without adding decode latency.
- –Together’s own charts show Mamba-3 SISO matching or beating Mamba-2 on prefill+decode latency at 1.5B scale, including against a Transformer baseline.
- –The open-sourced kernels are a big deal for adoption; architecture papers are nice, but usable Triton/TileLang/CuTe code is what lets the community test the claims.
- –The paper still concedes the classic SSM tradeoff: fixed-state models remain weaker than Transformers on some retrieval-heavy tasks, which is why hybrid stacks still look likely.
- –Net: Mamba-3 feels less like “Mamba, but nicer” and more like a category thesis for inference-heavy AI systems.
// TAGS
mamba-3llminferenceresearchbenchmark
DISCOVERED
25d ago
2026-03-18
PUBLISHED
25d ago
2026-03-18
RELEVANCE
9/ 10
AUTHOR
incarnadine72