Llama.cpp fixes critical MTP server VRAM leak

// 45d agoPRODUCT UPDATE

Llama.cpp fixes critical MTP server VRAM leak

Llama.cpp release b9274 resolves a severe memory leak in the server component that affected users of Multi-Token Prediction (MTP) models. The fix ensures that speculative decoders and draft contexts are properly destroyed during sleep/resume cycles, preventing progressive GPU memory exhaustion.

// ANALYSIS

This is a critical stability patch for anyone running speculative decoding in production via the llama.cpp server.

–Prior to this fix, the server would repeatedly allocate new draft contexts without freeing old ones during idle sleep, inevitably leading to OOM crashes.
–The patch guarantees that `ctx_dft` and `model_dft` are explicitly freed in the `destroy()` function.
–It highlights the ongoing challenges of state management in complex local LLM inference setups, particularly when mixing speculative decoding with idle resource pausing.

// TAGS

llama.cppllminferencegpuopen-sourcelocal-first

DISCOVERED

45d ago

2026-05-22

PUBLISHED

45d ago

2026-05-21

RELEVANCE

8/ 10

AUTHOR

Bulky-Priority6824

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS18m ago

OpenCode teases collaborative gangprompting bot

OpenCode co-founder Dax Raad teased the upcoming public release of their collaborative AI agent, which allows team members from different disciplines to co-prompt the agent in group chats. Raad noted that this "gangprompting" workflow provides richer context, fosters real-time collective ideation, and significantly improves productivity compared to solo prompting.

OPEN SOURCE1h ago

Claude Fable debugs sqlite-utils release candidate

Simon Willison, creator of the sqlite-utils Python library, used Anthropic's Claude Fable agent (via Claude Code) to diagnose and resolve five critical, release-blocking bugs in the 4.0 release candidate. The entire debugging and polishing process cost $149.25, including resolving a transaction issue in table.delete_where() that could cause silent data loss.

LAUNCH2h ago

crv Pro translates video camera movements for LLMs

crv Pro is a local, paid companion to the free claude-real-video tool that generates detailed video motion and pacing analysis. The tool runs locally to classify camera movements, track editing rhythms, and extract action sequences into plain text for LLM reasoning.