OPEN_SOURCE ↗
X · X// 4h agoMODEL RELEASE
SubQ ships 12M-token context model
Subquadratic’s SubQ is a new LLM and API built on fully subquadratic sparse attention, with a research result at 12 million tokens and a private beta starting today. The company says that architecture cuts attention compute dramatically while also powering SubQ Code, a CLI coding agent for repo-scale workflows.
// ANALYSIS
This is a serious architecture bet, not just a bigger context-window marketing stunt. If the benchmark claims hold up outside Subquadratic’s own testing, SubQ could make long-context agents less dependent on brittle retrieval pipelines.
- –The 52x FlashAttention speedup claim matters more than the raw 12M-token number, because long context only becomes useful when latency and cost stay sane.
- –SubQ looks strongest for codebases, research corpora, and long-running agent state, where preserving full context is more valuable than narrow chat.
- –The benchmark story is still partly self-reported and internally framed, so independent replication will decide whether this is a real frontier leap or a well-tuned demo.
- –SubQ Code is the product angle to watch: if one model can handle whole repositories in a single pass, it could simplify agent orchestration and reduce retrieval plumbing.
- –The $29M seed gives the company runway, but the hard part is proving that sparse attention can stay reliable as customers push it into production workloads.
// TAGS
subqllmlong-contextinferenceagentcoding-agentapi
DISCOVERED
4h ago
2026-05-05
PUBLISHED
4h ago
2026-05-05
RELEVANCE
10/ 10
AUTHOR
heynavtoor