llama.cpp spikes RAM at 131k context

// 83d agoTUTORIAL

llama.cpp spikes RAM at 131k context

A user on r/LocalLLaMA hit a 16GB KV cache allocation in `llama-server` after running with `n_ctx = 131072`, which caused the process to get killed on a 16GB CPU-only Linux Mint machine. The thread shows the usual trap: quantized weights may fit, but the KV cache can still blow past available RAM.

// ANALYSIS

This looks like a context-size footgun, not a broken GGUF. In llama.cpp, `-c`/`--ctx-size` directly drives KV cache allocation, so a 131k window can turn a small local setup into an OOM event.

–The log line `n_ctx = 131072` is the smoking gun, and the reported `CPU KV buffer size = 16384.00 MiB` matches that setting.
–Q4_K_M reduces model weight size, but it does not shrink KV cache memory by itself.
–`llama-server` is more sensitive than a one-shot CLI run because it reserves memory for serving multiple sequences and longer prompts.
–The most likely fix is to lower the context size or remove any lingering `-c 131072` from the launcher; llama.cpp docs and community explanations describe `--ctx-size` as the cache budget ([README](https://github.com/ggml-org/llama.cpp), [context-size discussion](https://github.com/ggerganov/llama.cpp/discussions/4130)).

// TAGS

llminferenceopen-sourceself-hosteddevtoolllama-cpp

DISCOVERED

83d ago

2026-03-19

PUBLISHED

83d ago

2026-03-19

RELEVANCE

8/ 10

AUTHOR

Automatic_Finish8598

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE14m ago

Netlify launches an official plugin in the Cursor marketplace to provide AI models with native context on Netlify functions, databases, and deploys.

Netlify has released an official integration in the Cursor Marketplace, bringing developer-focused capabilities directly into the Cursor IDE. The plugin includes 13 skills and 27 rules to give Cursor's AI models precise context regarding Netlify's features, such as functions, edge functions, Blobs, Database, caching, the AI Gateway, CLI, and deployments.

MODEL17m ago

Anthropic launches Claude Fable 5

Anthropic has released Claude Fable 5, its most powerful public model designed specifically for complex, long-running agentic tasks. The model features built-in safety classifiers that automatically reroute sensitive requests in cybersecurity, biology, or chemistry to Claude Opus 4.8.

TUTORIAL43m ago

Matt Pocock ships /teach agent skill

Matt Pocock shared a step-by-step guide for developers seeking to transition from junior to senior using coding agents like Claude Code. The process involves installing his custom /teach skill, setting up a dedicated workspace directory, and running the terminal-based AI agent.