OPEN_SOURCE ↗
REDDIT · REDDIT// 23d agoOPENSOURCE RELEASE
Qwen3-VL fits 12GB local rigs
A Reddit user is looking for a permissive local multimodal model that can handle both general and NSFW prompts, run on a 12GB GPU, and do image understanding. They point to Qwen3-VL-8B-Instruct-GGUF in oobabooga, but the bigger issue seems to be runtime support and model-loading setup rather than raw VRAM alone.
// ANALYSIS
The real story here is that “NSFW-capable” is less about a magic model and more about choosing an open-weight vision stack that fits your hardware and frontend. Qwen3-VL looks like a plausible fit for a 12GB card, but image generation is still a separate model class.
- –The official GGUF card lists Q4_K_M at about 5.03 GB and Q8_0 at about 8.71 GB, so a 12GB GPU is workable with quantization.
- –Qwen3-VL is image-text-to-text, not an image generator; if the user wants generation too, they’ll need a separate diffusion or T2I model.
- –The thread’s symptom looks like integration friction: the model card says to use the latest llama.cpp stack and load the vision `mmproj` file correctly.
- –For “NSFW + usual stuff,” the checkpoint policy and UI filters matter as much as model size; open weights help, but they do not guarantee uncensored behavior.
- –In practical terms, this is a local-inference stack question more than a single-model question.
// TAGS
qwen3-vlmultimodalllmopen-sourceopen-weightsself-hostedinferencegpu
DISCOVERED
23d ago
2026-03-19
PUBLISHED
24d ago
2026-03-19
RELEVANCE
7/ 10
AUTHOR
yakasantera1