OPEN_SOURCE ↗
PH · PRODUCT_HUNT// 31d agoMODEL RELEASE
Olmo Hybrid tests hybrid scaling thesis
Ai2 has released Olmo Hybrid, a fully open 7B model that mixes transformer attention with Gated DeltaNet linear RNN layers in a 3:1 pattern. The pitch is simple but important: match Olmo 3’s capability with far fewer training tokens, then stretch further on long-context workloads.
// ANALYSIS
Olmo Hybrid matters less as a one-off 7B drop and more as evidence that hybrid architectures are becoming a credible path beyond pure transformers. Ai2 is making a strong open-model case that better architectural bias can buy real data efficiency, not just benchmark theater.
- –Ai2 says Olmo Hybrid reaches Olmo 3-level MMLU accuracy with 49% fewer tokens, a serious claim if replicated beyond its internal comparisons
- –The model swaps 75% of attention mixing for Gated DeltaNet while keeping training throughput comparable to Olmo 3, suggesting the gains are architectural rather than a speed-for-quality trade
- –Long-context results look especially strong: at 64k context with DRoPE, Ai2 reports 85.0 on RULER versus 70.9 for Olmo 3 7B with YaRN
- –The release is fully open and comes with a technical report, which makes it more useful to researchers and open-model builders than yet another black-box architecture tease
// TAGS
olmo-hybridllmopen-weightsresearchbenchmark
DISCOVERED
31d ago
2026-03-11
PUBLISHED
36d ago
2026-03-07
RELEVANCE
9/ 10
AUTHOR
[REDACTED]