Cydonia 24B v4.3 hits 16GB ceiling

// 112d agoINFRASTRUCTURE

Cydonia 24B v4.3 hits 16GB ceiling

A LocalLLaMA user with an RTX 5060 Ti 16GB asks whether Cydonia 24B v4.3 Q4_K_M is still the right RP setup in KoboldCpp. The thread frames 16GB as enough for a 24B quant, but tight enough that Qwen3.5 9B, 27B, or 35B offload-friendly alternatives become the real comparison.

// ANALYSIS

This is the quintessential local-LLM compromise: 16GB VRAM buys you choice, not freedom. For RP, the real decision is whether you want a faster 9B model or a bigger MoE/27B setup that leans on DDR5 and accepts some offload.

–Cydonia-24B-v4.3 Q4_K_M sits around 14.3GB as a GGUF, so it fits but leaves very little headroom once KV cache and runtime overhead enter the picture.
–Qwen3.5 9B is the speed-first answer if you care more about tokens per second than raw model size.
–Qwen3.5 27B Q3_K_S and Qwen3.5 35B A3B quants are the quality-first stretch options when RAM offload is acceptable.
–KoboldCpp is a good fit for this kind of tuning because the offload, context, and GPU-layer knobs are easy to reason about.

// TAGS

cydonia-24b-v4.3koboldcppllminferencegpuself-hostedopen-weights

DISCOVERED

112d ago

2026-03-23

PUBLISHED

112d ago

2026-03-23

RELEVANCE

7/ 10

AUTHOR

Foxy-The-Pirata

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS10m ago

swyx outlines specialized multi-model AI workflow

In a recent tweet, swyx shared his multi-model AI stack for complex projects, assigning specialized tasks to models like sol ultra for planning, fable 5 for critiquing, and sonnet 5 for code generation. He also highlighted the importance of interactive, interview-style prompting to clarify design decisions.

LAUNCH38m ago

Brockman highlights ChatGPT Work mobile experience

OpenAI President and Co-founder Greg Brockman shared his enthusiasm for ChatGPT Work, noting that while the new agent-based platform has received less attention than other recent updates, it offers a highly functional and impressive mobile experience. Powered by the GPT-5.6 model family, ChatGPT Work transitions ChatGPT from a conversational chatbot into an autonomous agent capable of executing complex, multi-step workflows and cross-app integrations directly from mobile and desktop interfaces.

LAUNCH1h ago

OpenAI launches ChatGPT Sites for web apps

ChatGPT Sites is a new feature by OpenAI designed to make internal communication more engaging and functional by letting users create custom web apps, trackers, and calculators via simple chat prompts. It eliminates traditional frontend coding and deployment steps, allowing teams to quickly generate and deploy interactive, shareable web pages.