llama.cpp adds PDL support for Blackwell inference boost

// 45d agoPRODUCT UPDATE

llama.cpp adds PDL support for Blackwell inference boost

llama.cpp now supports NVIDIA's Programmatic Dependent Launch (PDL) on Hopper and Blackwell architectures. The opt-in compile flag overlaps CUDA kernel scheduling to reduce idle time, yielding up to a 10% boost in token generation throughput.

// ANALYSIS

This hardware-level synchronization feature is a highly practical win for local inference enthusiasts and deployment environments running on modern NVIDIA silicon.

–PDL allows secondary kernels to begin scheduling before primary kernels finish, significantly increasing GPU utilization during memory-bound LLM token generation.
–The feature is currently opt-in and requires compiling with the `-D GGML_CUDA_PDL=ON` flag, meaning many users might miss the performance gain if they rely on default pre-built binaries.
–While prefill speeds remain largely unaffected, the 5-10% generation speedup is a "free lunch" especially beneficial for large batch inference or continuous serving.
–Users deploying on Blackwell should ensure they are using CUDA 12.8, as some earlier versions show regressions with specific kernels.

// TAGS

llama-cppinferencegpuopen-sourcellm

DISCOVERED

45d ago

2026-05-23

PUBLISHED

45d ago

2026-05-22

RELEVANCE

8/ 10

AUTHOR

UncleRedz

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

TUTORIAL1h ago

Course teaches AI agent harness engineering

Learn Harness Engineering is a project-based curriculum that teaches developers how to construct execution environments, state management, verification loops, and control mechanisms for AI coding agents. The course includes 12 lectures and 6 hands-on projects, references engineering practices from OpenAI and Anthropic, is available in 15 languages, and focuses on transitioning from prompt-level adjustments to building stable, production-ready system harnesses.

LAUNCH2h ago

CodeClone unveils rundown for AI agents

rundown is a tool built for AI-assisted development loops to address the issue of AI agents consuming significant token context reading verbose command logs. Since agents frequently parse raw pytest, type-checking, and linting output, and can still misinterpret the outcome, rundown runs the configured checks and establishes a deterministic contract for verification.

OPEN SOURCE5h ago

OpenHands launches Agent Canvas control center

OpenHands has launched Agent Canvas, an open-source, self-hosted control plane for managing and automating multiple AI coding agents. Supporting runtimes like Claude Code and Codex via the Agent Client Protocol (ACP), the platform enables event-driven and scheduled engineering workflows across local, VM, and cloud backends.