BACK_TO_FEEDAICRIER_2
Cydonia 24B v4.3 hits 16GB ceiling
OPEN_SOURCE ↗
REDDIT · REDDIT// 19d agoINFRASTRUCTURE

Cydonia 24B v4.3 hits 16GB ceiling

A LocalLLaMA user with an RTX 5060 Ti 16GB asks whether Cydonia 24B v4.3 Q4_K_M is still the right RP setup in KoboldCpp. The thread frames 16GB as enough for a 24B quant, but tight enough that Qwen3.5 9B, 27B, or 35B offload-friendly alternatives become the real comparison.

// ANALYSIS

This is the quintessential local-LLM compromise: 16GB VRAM buys you choice, not freedom. For RP, the real decision is whether you want a faster 9B model or a bigger MoE/27B setup that leans on DDR5 and accepts some offload.

  • Cydonia-24B-v4.3 Q4_K_M sits around 14.3GB as a GGUF, so it fits but leaves very little headroom once KV cache and runtime overhead enter the picture.
  • Qwen3.5 9B is the speed-first answer if you care more about tokens per second than raw model size.
  • Qwen3.5 27B Q3_K_S and Qwen3.5 35B A3B quants are the quality-first stretch options when RAM offload is acceptable.
  • KoboldCpp is a good fit for this kind of tuning because the offload, context, and GPU-layer knobs are easy to reason about.
// TAGS
cydonia-24b-v4.3koboldcppllminferencegpuself-hostedopen-weights

DISCOVERED

19d ago

2026-03-23

PUBLISHED

19d ago

2026-03-23

RELEVANCE

7/ 10

AUTHOR

Foxy-The-Pirata