BACK_TO_FEEDAICRIER_2
Llama.cpp WebUI mixes chat sessions
OPEN_SOURCE ↗
REDDIT · REDDIT// 19d agoINFRASTRUCTURE

Llama.cpp WebUI mixes chat sessions

A Reddit user reports llama.cpp's built-in WebUI can bleed one conversation into another when multiple prompts are active, and a restart temporarily clears the state. That points more to frontend/session-state leakage than a model or VRAM problem.

// ANALYSIS

This looks like WebUI state leakage, not a bad quant or GPU problem. The built-in frontend is improving fast, but I'd treat it as a convenience layer, not a bulletproof session manager.

  • llama.cpp's own guide says the new SvelteKit WebUI supports parallel conversations and remote users, so cross-chat bleed is a regression in session handling, not an expected backend limit: [guide](https://github.com/ggml-org/llama.cpp/discussions/16938)
  • Nearby issues show the WebUI overriding server-side sampling defaults and cache reuse/prefix handling changing across versions, which makes contamination between turns plausible: [sampling defaults](https://github.com/ggml-org/llama.cpp/issues/16201), [cache reuse](https://github.com/ggml-org/llama.cpp/issues/15082)
  • Backend slot/concurrency flags help throughput, but they do not guarantee hard conversation isolation.
  • If you need a practical mitigation today, one tab per chat, a refresh/localStorage reset when state desyncs, or a separate frontend like Open WebUI is the safer play.
// TAGS
llama-cppllminferenceself-hostedopen-sourcedevtoolapi

DISCOVERED

19d ago

2026-03-23

PUBLISHED

19d ago

2026-03-23

RELEVANCE

8/ 10

AUTHOR

Ok-Measurement-1575