OPEN_SOURCE ↗
REDDIT · REDDIT// 19d agoINFRASTRUCTURE
Llama.cpp WebUI mixes chat sessions
A Reddit user reports llama.cpp's built-in WebUI can bleed one conversation into another when multiple prompts are active, and a restart temporarily clears the state. That points more to frontend/session-state leakage than a model or VRAM problem.
// ANALYSIS
This looks like WebUI state leakage, not a bad quant or GPU problem. The built-in frontend is improving fast, but I'd treat it as a convenience layer, not a bulletproof session manager.
- –llama.cpp's own guide says the new SvelteKit WebUI supports parallel conversations and remote users, so cross-chat bleed is a regression in session handling, not an expected backend limit: [guide](https://github.com/ggml-org/llama.cpp/discussions/16938)
- –Nearby issues show the WebUI overriding server-side sampling defaults and cache reuse/prefix handling changing across versions, which makes contamination between turns plausible: [sampling defaults](https://github.com/ggml-org/llama.cpp/issues/16201), [cache reuse](https://github.com/ggml-org/llama.cpp/issues/15082)
- –Backend slot/concurrency flags help throughput, but they do not guarantee hard conversation isolation.
- –If you need a practical mitigation today, one tab per chat, a refresh/localStorage reset when state desyncs, or a separate frontend like Open WebUI is the safer play.
// TAGS
llama-cppllminferenceself-hostedopen-sourcedevtoolapi
DISCOVERED
19d ago
2026-03-23
PUBLISHED
19d ago
2026-03-23
RELEVANCE
8/ 10
AUTHOR
Ok-Measurement-1575