OPEN_SOURCE ↗
REDDIT · REDDIT// 5h agoINFRASTRUCTURE
Qwen, Gemma vie for M4 coding
A LocalLLaMA thread asks which local models make sense for heavy coding on a 24GB MacBook M4, with commenters pointing to Qwen and Gemma-class models while warning that cloud models still win for serious coding work. The practical advice centers on smaller quantized models, MLX-aware runtimes, and tools like Ollama or LM Studio.
// ANALYSIS
The useful takeaway is not "run the biggest model you can squeeze in"; it is that 24GB Apple Silicon is a capable local inference box, but still a compromise for coding agents.
- –Qwen gets the strongest community nod, with 14B-class models fitting comfortably and 30B-ish quantized models possible but tighter on memory and context
- –Gemma is framed as a safer fit on 24GB, while larger Qwen variants may trade too much speed or headroom for quality
- –For actual heavy coding, commenters still favor Claude, Codex, or other paid remote models, using local LLMs for drafts, private tasks, or backend automation
- –MLX support matters on Mac because memory bandwidth and Apple Silicon-native loaders can make the difference between usable and frustrating
// TAGS
qwengemmallmai-codinginferenceedge-aiopen-weights
DISCOVERED
5h ago
2026-04-21
PUBLISHED
6h ago
2026-04-21
RELEVANCE
6/ 10
AUTHOR
Extra-Perception2408