llama.cpp mmap path enables live tampering
A new proof of concept shows a running llama-server can start reading modified GGUF weights mid-inference when the model file is memory-mapped and another process still has write access to it. That turns shared volumes and weak file isolation into a real integrity risk for local LLM deployments, even without restarting the server or injecting code.
This is the kind of LLM security issue developers underestimate because nothing “crashes” and the server still looks healthy. It is less a model bug than an ops-layer failure mode where mmap, shared storage, and permissive mounts quietly become part of the attack surface.
- –The PoC targets output.weight in a GGUF file and shows token logits can be biased live, forcing responses like “Pwned” across both completion and chat endpoints.
- –The attack needs write access to the model artifact, not root, ptrace, or code injection, which makes sloppy Docker and local dev setups the real problem zone.
- –--no-mmap, read-only model mounts, dedicated serving users, and runtime integrity checks look a lot less optional after this.
- –For teams shipping local copilots or on-prem inference, model files need to be treated like executable assets, not passive data blobs.
DISCOVERED
78d ago
2026-03-11
PUBLISHED
79d ago
2026-03-10
RELEVANCE
AUTHOR
Acanthisitta-Sea