OPEN_SOURCE ↗
YT · YOUTUBE// 5d agoOPENSOURCE RELEASE
cuLA ships linear-attention CUDA kernels for Hopper, Blackwell
cuLA is a low-level open-source repository of high-performance CUDA kernels for linear attention variants, implemented in CuTe DSL and CUTLASS C++. The project targets NVIDIA Hopper and Blackwell GPUs, and is designed to slot into flash-linear-attention with a minimal import change. The repository is explicitly early-stage, but it already positions itself as a specialized kernel layer for KDA, GLA-style, and related linear-attention workloads.
// ANALYSIS
Hot take: this is infrastructure-first work, not a user-facing app, but it matters if you care about squeezing real performance out of long-context attention on recent NVIDIA hardware.
- –The technical angle is strong: CuTe DSL plus CUTLASS suggests the repo is built for hardware-aware kernel tuning rather than portability theater.
- –The positioning is clear: cuLA is meant to complement flash-linear-attention, which lowers adoption friction for teams already in that ecosystem.
- –The scope is narrow but credible: support for Hopper and Blackwell, with explicit mention of linear-attention variants like KDA and gating/delta-style methods.
- –The main risk is maturity; the repo itself says it is early stage and that APIs and kernels may still change.
// TAGS
cudalinear-attentioncutlasscute-dslnvidiahopperblackwellopen-sourcegpu-kernelsattention
DISCOVERED
5d ago
2026-04-06
PUBLISHED
5d ago
2026-04-06
RELEVANCE
9/ 10
AUTHOR
Github Awesome