BACK_TO_FEEDAICRIER_2
cuLA ships linear-attention CUDA kernels for Hopper, Blackwell
OPEN_SOURCE ↗
YT · YOUTUBE// 5d agoOPENSOURCE RELEASE

cuLA ships linear-attention CUDA kernels for Hopper, Blackwell

cuLA is a low-level open-source repository of high-performance CUDA kernels for linear attention variants, implemented in CuTe DSL and CUTLASS C++. The project targets NVIDIA Hopper and Blackwell GPUs, and is designed to slot into flash-linear-attention with a minimal import change. The repository is explicitly early-stage, but it already positions itself as a specialized kernel layer for KDA, GLA-style, and related linear-attention workloads.

// ANALYSIS

Hot take: this is infrastructure-first work, not a user-facing app, but it matters if you care about squeezing real performance out of long-context attention on recent NVIDIA hardware.

  • The technical angle is strong: CuTe DSL plus CUTLASS suggests the repo is built for hardware-aware kernel tuning rather than portability theater.
  • The positioning is clear: cuLA is meant to complement flash-linear-attention, which lowers adoption friction for teams already in that ecosystem.
  • The scope is narrow but credible: support for Hopper and Blackwell, with explicit mention of linear-attention variants like KDA and gating/delta-style methods.
  • The main risk is maturity; the repo itself says it is early stage and that APIs and kernels may still change.
// TAGS
cudalinear-attentioncutlasscute-dslnvidiahopperblackwellopen-sourcegpu-kernelsattention

DISCOVERED

5d ago

2026-04-06

PUBLISHED

5d ago

2026-04-06

RELEVANCE

9/ 10

AUTHOR

Github Awesome