ik_llama.cpp hits 26x speedup on Qwen 3.5

// 80d agoINFRASTRUCTURE

ik_llama.cpp hits 26x speedup on Qwen 3.5

A specialized fork of llama.cpp introduces fused CUDA kernels for Qwen 3.5's hybrid Gated DeltaNet architecture, achieving a 26x speedup in prompt evaluation and 3.5x in generation.

// ANALYSIS

Mainline llama.cpp's struggle with hybrid SSM architectures like Qwen 3.5 highlights a growing optimization gap as linear-time models gain traction.

–Fused GDN kernels reduce graph splits from 34 to 2, offloading recurrent computation entirely from the CPU to the GPU.
–A 26x jump in prompt processing (from 43 to 1,122 tok/sec) makes the 27B model viable for agentic coding even with mandatory re-processing.
–Qwen 3.5's hybrid architecture is technically superior for long context but requires specific low-level kernel support that mainline has yet to integrate.
–Pre-built Windows binaries with CUDA 12.8 and AVX512 VNNI are available via the Thireus fork as a drop-in replacement for llama-server.

// TAGS

ik-llama-cppqweninferencegpuopen-source

DISCOVERED

80d ago

2026-03-22

PUBLISHED

80d ago

2026-03-22

RELEVANCE

8/ 10

AUTHOR

New-Inspection7034

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE16m ago

Netlify launches an official plugin in the Cursor marketplace to provide AI models with native context on Netlify functions, databases, and deploys.

Netlify has released an official integration in the Cursor Marketplace, bringing developer-focused capabilities directly into the Cursor IDE. The plugin includes 13 skills and 27 rules to give Cursor's AI models precise context regarding Netlify's features, such as functions, edge functions, Blobs, Database, caching, the AI Gateway, CLI, and deployments.

MODEL19m ago

Anthropic launches Claude Fable 5

Anthropic has released Claude Fable 5, its most powerful public model designed specifically for complex, long-running agentic tasks. The model features built-in safety classifiers that automatically reroute sensitive requests in cybersecurity, biology, or chemistry to Claude Opus 4.8.

TUTORIAL45m ago

Matt Pocock ships /teach agent skill

Matt Pocock shared a step-by-step guide for developers seeking to transition from junior to senior using coding agents like Claude Code. The process involves installing his custom /teach skill, setting up a dedicated workspace directory, and running the terminal-based AI agent.