YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Kwai-Keye drops 30B multimodal MoE with DSA attention

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Kwai-Keye drops 30B multimodal MoE with DSA attention
OPEN LINK ↗
// 2h agoMODEL RELEASE

Kwai-Keye drops 30B multimodal MoE with DSA attention

Kuaishou's Keye team released Keye-VL-2.0-30B-A3B, a 30B-parameter multimodal MoE that integrates DeepSeek Sparse Attention (DSA). The architecture bounds KV cache growth, enabling 256K-token context windows for multi-hour video analysis on consumer hardware.

// ANALYSIS

Bringing DeepSeek Sparse Attention into a multimodal architecture solves the memory explosion problem that traditionally makes long-video reasoning prohibitively expensive.

  • DSA restructures how attention weights are stored, preventing the linear KV cache scaling that normally plagues long-context vision models
  • The MoE architecture only activates 3B parameters per forward pass, making local inference highly efficient
  • Early benchmarks suggest it matches Gemini 1.5 Flash on temporal grounding and outperforms larger open-weight models like Qwen3-VL-235B
  • The model introduces the first agent capabilities in the Keye series, supporting visual self-correction and tool use
// TAGS
keye-vl-2.0-30b-a3bllmmultimodalmoelong-contextopen-weightsvision

DISCOVERED

2h ago

2026-05-26

PUBLISHED

5h ago

2026-05-26

RELEVANCE

9/ 10

AUTHOR

External_Mood4719