fla-volta unlocks Gated DeltaNet on V100

// 110d agoOPENSOURCE RELEASE

fla-volta unlocks Gated DeltaNet on V100

InMecha's fla-volta backports native CUDA kernels for Flash Linear Attention's Gated DeltaNet path so it can run on NVIDIA Volta V100 GPUs, where the stock Triton kernels hang on sm_70. The repo is aimed at HuggingFace Transformers users and positions itself as a research-grade compatibility layer for Qwen3.5-class models, with the README showing a modest tok/s lift and a bigger hardware-compatibility win.

// ANALYSIS

This is a rare back-port that feels more like infrastructure preservation than product polish.

–Replaces two FLA components with handwritten CUDA kernels, including a fused RMSNorm + SiLU gate and a fused recurrent Gated DeltaNet kernel adapted from llama.cpp
–README benchmarks show 16.8 tok/s on a V100 for Qwen3.5-2B versus 11.5 tok/s with the PyTorch fallback, but the authors say HuggingFace generation overhead caps end-to-end gains
–The real value is keeping older V100 fleets useful for modern linear-attention models instead of waiting for upstream Triton support to catch up
–It is explicitly research-only, needs CUDA 12.x plus low-level GPU/CU skills, and the maintainers are not promising active support

// TAGS

fla-voltagpuinferenceopen-sourcellmself-hosted

DISCOVERED

110d ago

2026-03-24

PUBLISHED

111d ago

2026-03-23

RELEVANCE

8/ 10

AUTHOR

Sliouges

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE43m ago

ChatGPT retains GPT-5.6 Sol for paid tiers

An announcement confirmed that the new GPT 5.6 Sol model will be accessible to all paying ChatGPT subscribers, including those on the Go, Plus, Pro, Team, and Edu plans. Users are assured that this advanced model will remain a part of their current subscription package at least until an even better model is shipped.

VIDEO50m ago

Video revisits pre-launch GPT-5.6, Grok 4.5 rumors

This video provides a retrospective look at the rumors, speculation, and mystery that surrounded OpenAI's GPT-5.6 prior to its official launch in July 2026. The commentary highlights the community's anticipation of GPT-5.6's capabilities—such as its new tiers (Sol, Terra, and Luna) and advanced agentic features—in comparison to other concurrent frontier developments, including xAI's Grok 4.5, a massive 2.7T-parameter open-source model from MiniMax, DeepSeek's AI chip efforts, and Microsoft's Orca world model.

INFRA1h ago

NaN Builders hosts parallel OpenCode agents

NaN Builders is a flat-rate GPU inference platform offering developers persistent, isolated microVM environments. A developer demonstrated the platform by running three parallel OpenCode coding agents using self-hosted models hosted directly on NaN Builders, avoiding token-metered fees.