BACK_TO_FEEDAICRIER_2
LM Studio hits performance cliff on dense models
OPEN_SOURCE ↗
REDDIT · REDDIT// 7h agoNEWS

LM Studio hits performance cliff on dense models

Users of LM Studio are reporting extreme performance degradation when running latest-generation dense models like Devstral-Small-2-24B and Gemma 4 26B on NVIDIA hardware. Sub-1 tk/s speeds are common when these models spill into system RAM, exposing critical VRAM management and runtime issues in current software builds.

// ANALYSIS

The local LLM community is hitting a "dense model tax" as model sizes and context windows outpace consumer VRAM.

  • VRAM spillage into system RAM causes a much harsher performance cliff for dense architectures compared to MoE equivalents
  • Switching to the Vulkan runtime surprisingly outperforms CUDA for certain 2026-era models on high-end NVIDIA cards
  • Massive 256K context windows consume critical memory needed for model weights, necessitating manual context limits
  • Software updates to v0.4.9+ are mandatory to handle the unique per-layer embedding architectures of the newest models
// TAGS
lm-studiollmgpunvidiamistralgemmalocal-aiinference

DISCOVERED

7h ago

2026-04-19

PUBLISHED

10h ago

2026-04-19

RELEVANCE

7/ 10

AUTHOR

HowdyCapybara