X · X// 4h agoMODEL RELEASE

SubQ ships 12M-token context model

Subquadratic’s SubQ is a new LLM and API built on fully subquadratic sparse attention, with a research result at 12 million tokens and a private beta starting today. The company says that architecture cuts attention compute dramatically while also powering SubQ Code, a CLI coding agent for repo-scale workflows.

// ANALYSIS

This is a serious architecture bet, not just a bigger context-window marketing stunt. If the benchmark claims hold up outside Subquadratic’s own testing, SubQ could make long-context agents less dependent on brittle retrieval pipelines.

–The 52x FlashAttention speedup claim matters more than the raw 12M-token number, because long context only becomes useful when latency and cost stay sane.
–SubQ looks strongest for codebases, research corpora, and long-running agent state, where preserving full context is more valuable than narrow chat.
–The benchmark story is still partly self-reported and internally framed, so independent replication will decide whether this is a real frontier leap or a well-tuned demo.
–SubQ Code is the product angle to watch: if one model can handle whole repositories in a single pass, it could simplify agent orchestration and reduce retrieval plumbing.
–The $29M seed gives the company runway, but the hard part is proving that sparse attention can stay reliable as customers push it into production workloads.

// TAGS

subqllmlong-contextinferenceagentcoding-agentapi

DISCOVERED

4h ago

2026-05-05

PUBLISHED

4h ago

2026-05-05

RELEVANCE

10/ 10

AUTHOR

heynavtoor