OPEN_SOURCE ↗
REDDIT · REDDIT// 25d agoTUTORIAL
Ollama makes 64K Mac coding viable
A LocalLLaMA user asks whether an M1 Pro with 32GB can handle 64K or 128K context for coding, and commenters point to Ollama-friendly mid-size models like Qwen2.5-Coder 14B Q4 and Gemma 3 12B. The consensus is that 64K is realistic, while 128K starts to strain latency and memory on this class of Mac.
// ANALYSIS
The practical answer is yes, but only if you stop chasing giant models and treat context length as a memory budget, not a badge of honor.
- –Ollama’s own docs say coding tools should generally be set to at least 64k context, but larger windows increase memory use fast.
- –On 32GB unified memory, quantized 7B-14B models are the sweet spot; 32B-class models get uncomfortable quickly.
- –Qwen2.5-Coder is a strong fit here because it officially supports up to 128K context and is tuned for coding tasks.
- –MLX may squeeze better Apple-silicon performance, but Ollama and llama.cpp remain the most practical defaults for broad local model support.
// TAGS
ollamallmai-codingself-hostedopen-sourceinference
DISCOVERED
25d ago
2026-03-18
PUBLISHED
25d ago
2026-03-17
RELEVANCE
7/ 10
AUTHOR
rkh4n