Gemma 3, Mistral Small 3.2 lead VRAM choices
Reddit’s r/LocalLLaMA community identifies Google’s Gemma 3 (27B) and Mistral Small 3.2 (24B) as the premier choices for creative writing on 32GB VRAM setups, balancing narrative flair with high-fidelity local execution.
The 24B–30B parameter range has emerged as the definitive "sweet spot" for dual-GPU 32GB setups, allowing for high-precision quants with massive context windows.
- –Gemma 3 (27B) is praised for superior instruction following in complex storytelling, fitting comfortably at Q5/Q6 quants with room for its full 128k context.
- –Mistral Small 3.2 (24B) remains the "prose king" for many, offering a more human-like narrative flow that avoids the clinical tone typical of larger logic-focused models.
- –While the newer 100B+ MoE models (Llama 4 Scout, Mistral Small 4) can fit via extreme 2-bit quantization, the 27B tier provides a superior speed-to-intelligence ratio for real-time conversation.
- –Multimodal support in both models allows authors to ground story generations in visual references or character art directly within local frontends like SillyTavern.
- –Community consensus emphasizes avoiding CPU offloading; these models run entirely in VRAM, ensuring the sub-100ms latency required for fluid creative "jamming."
DISCOVERED
45d ago
2026-04-15
PUBLISHED
45d ago
2026-04-15
RELEVANCE
AUTHOR
VolggaWax