YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

fla-volta unlocks Gated DeltaNet on V100

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

fla-volta unlocks Gated DeltaNet on V100
OPEN LINK ↗
// 65d agoOPENSOURCE RELEASE

fla-volta unlocks Gated DeltaNet on V100

InMecha's fla-volta backports native CUDA kernels for Flash Linear Attention's Gated DeltaNet path so it can run on NVIDIA Volta V100 GPUs, where the stock Triton kernels hang on sm_70. The repo is aimed at HuggingFace Transformers users and positions itself as a research-grade compatibility layer for Qwen3.5-class models, with the README showing a modest tok/s lift and a bigger hardware-compatibility win.

// ANALYSIS

This is a rare back-port that feels more like infrastructure preservation than product polish.

  • Replaces two FLA components with handwritten CUDA kernels, including a fused RMSNorm + SiLU gate and a fused recurrent Gated DeltaNet kernel adapted from llama.cpp
  • README benchmarks show 16.8 tok/s on a V100 for Qwen3.5-2B versus 11.5 tok/s with the PyTorch fallback, but the authors say HuggingFace generation overhead caps end-to-end gains
  • The real value is keeping older V100 fleets useful for modern linear-attention models instead of waiting for upstream Triton support to catch up
  • It is explicitly research-only, needs CUDA 12.x plus low-level GPU/CU skills, and the maintainers are not promising active support
// TAGS
fla-voltagpuinferenceopen-sourcellmself-hosted

DISCOVERED

65d ago

2026-03-24

PUBLISHED

65d ago

2026-03-23

RELEVANCE

8/ 10

AUTHOR

Sliouges