BACK_TO_FEEDAICRIER_2
Ollama makes 64K Mac coding viable
OPEN_SOURCE ↗
REDDIT · REDDIT// 25d agoTUTORIAL

Ollama makes 64K Mac coding viable

A LocalLLaMA user asks whether an M1 Pro with 32GB can handle 64K or 128K context for coding, and commenters point to Ollama-friendly mid-size models like Qwen2.5-Coder 14B Q4 and Gemma 3 12B. The consensus is that 64K is realistic, while 128K starts to strain latency and memory on this class of Mac.

// ANALYSIS

The practical answer is yes, but only if you stop chasing giant models and treat context length as a memory budget, not a badge of honor.

  • Ollama’s own docs say coding tools should generally be set to at least 64k context, but larger windows increase memory use fast.
  • On 32GB unified memory, quantized 7B-14B models are the sweet spot; 32B-class models get uncomfortable quickly.
  • Qwen2.5-Coder is a strong fit here because it officially supports up to 128K context and is tuned for coding tasks.
  • MLX may squeeze better Apple-silicon performance, but Ollama and llama.cpp remain the most practical defaults for broad local model support.
// TAGS
ollamallmai-codingself-hostedopen-sourceinference

DISCOVERED

25d ago

2026-03-18

PUBLISHED

25d ago

2026-03-17

RELEVANCE

7/ 10

AUTHOR

rkh4n