OPEN_SOURCE ↗
REDDIT · REDDIT// 7h agoNEWS
LM Studio hits performance cliff on dense models
Users of LM Studio are reporting extreme performance degradation when running latest-generation dense models like Devstral-Small-2-24B and Gemma 4 26B on NVIDIA hardware. Sub-1 tk/s speeds are common when these models spill into system RAM, exposing critical VRAM management and runtime issues in current software builds.
// ANALYSIS
The local LLM community is hitting a "dense model tax" as model sizes and context windows outpace consumer VRAM.
- –VRAM spillage into system RAM causes a much harsher performance cliff for dense architectures compared to MoE equivalents
- –Switching to the Vulkan runtime surprisingly outperforms CUDA for certain 2026-era models on high-end NVIDIA cards
- –Massive 256K context windows consume critical memory needed for model weights, necessitating manual context limits
- –Software updates to v0.4.9+ are mandatory to handle the unique per-layer embedding architectures of the newest models
// TAGS
lm-studiollmgpunvidiamistralgemmalocal-aiinference
DISCOVERED
7h ago
2026-04-19
PUBLISHED
10h ago
2026-04-19
RELEVANCE
7/ 10
AUTHOR
HowdyCapybara