OPEN_SOURCE ↗
REDDIT · REDDIT// 34d agoSECURITY INCIDENT
llama.cpp GGUF tampering exposes live steering risk
A new GitHub proof of concept shows that `llama-server` can have its behavior persistently altered at runtime if another process can write to the same mmap-backed GGUF file. By modifying Q6_K scale values in `output.weight`, the demo makes chosen tokens dominate output without ptrace, process injection, or a server restart.
// ANALYSIS
This is a sharp reminder that local LLM security is an infrastructure problem, not just a prompt-security problem.
- –The attack targets shared, writable model volumes, which are common in self-hosted and containerized inference setups
- –Because `llama.cpp` reads mmap-backed GGUF weights from disk-backed pages, file changes can propagate into live serving behavior immediately
- –The PoC is narrow rather than universal: it depends on default mmap behavior, writable access to the model file, and a compatible tensor/quantization layout
- –The mitigation advice is practical and high-signal for operators: mount model directories read-only, isolate serving permissions, avoid shared writable paths, and use `--no-mmap` where the threat model warrants it
- –For AI infra teams, this looks less like a novelty hack and more like a real integrity gap in local inference deployments
// TAGS
llm-inference-tamperingllama-cppllminferenceopen-sourcesafety
DISCOVERED
34d ago
2026-03-09
PUBLISHED
34d ago
2026-03-09
RELEVANCE
8/ 10
AUTHOR
Acanthisitta-Sea