OPEN_SOURCE ↗
REDDIT · REDDIT// 19d agoINFRASTRUCTURE
Cydonia 24B v4.3 hits 16GB ceiling
A LocalLLaMA user with an RTX 5060 Ti 16GB asks whether Cydonia 24B v4.3 Q4_K_M is still the right RP setup in KoboldCpp. The thread frames 16GB as enough for a 24B quant, but tight enough that Qwen3.5 9B, 27B, or 35B offload-friendly alternatives become the real comparison.
// ANALYSIS
This is the quintessential local-LLM compromise: 16GB VRAM buys you choice, not freedom. For RP, the real decision is whether you want a faster 9B model or a bigger MoE/27B setup that leans on DDR5 and accepts some offload.
- –Cydonia-24B-v4.3 Q4_K_M sits around 14.3GB as a GGUF, so it fits but leaves very little headroom once KV cache and runtime overhead enter the picture.
- –Qwen3.5 9B is the speed-first answer if you care more about tokens per second than raw model size.
- –Qwen3.5 27B Q3_K_S and Qwen3.5 35B A3B quants are the quality-first stretch options when RAM offload is acceptable.
- –KoboldCpp is a good fit for this kind of tuning because the offload, context, and GPU-layer knobs are easy to reason about.
// TAGS
cydonia-24b-v4.3koboldcppllminferencegpuself-hostedopen-weights
DISCOVERED
19d ago
2026-03-23
PUBLISHED
19d ago
2026-03-23
RELEVANCE
7/ 10
AUTHOR
Foxy-The-Pirata