LM Studio hits performance cliff on dense models
Users of LM Studio are reporting extreme performance degradation when running latest-generation dense models like Devstral-Small-2-24B and Gemma 4 26B on NVIDIA hardware. Sub-1 tk/s speeds are common when these models spill into system RAM, exposing critical VRAM management and runtime issues in current software builds.
The local LLM community is hitting a "dense model tax" as model sizes and context windows outpace consumer VRAM.
- –VRAM spillage into system RAM causes a much harsher performance cliff for dense architectures compared to MoE equivalents
- –Switching to the Vulkan runtime surprisingly outperforms CUDA for certain 2026-era models on high-end NVIDIA cards
- –Massive 256K context windows consume critical memory needed for model weights, necessitating manual context limits
- –Software updates to v0.4.9+ are mandatory to handle the unique per-layer embedding architectures of the newest models
DISCOVERED
45d ago
2026-04-19
PUBLISHED
45d ago
2026-04-19
RELEVANCE
AUTHOR
HowdyCapybara
