BACK_TO_FEEDAICRIER_2
Continue, Ollama hit 7B memory wall
OPEN_SOURCE ↗
REDDIT · REDDIT// 18d agoINFRASTRUCTURE

Continue, Ollama hit 7B memory wall

An r/LocalLLaMA user says Continue works fine with a 3B model, but Qwen2.5-Coder 7B crashes with 'llama runner process has terminated: exit status 2' once file context is added. They are asking whether any config.yaml or Ollama setting can keep the workload split between a 4GB GeForce 940MX and 24GB of RAM without blowing up.

// ANALYSIS

This crash appears to be a hard memory budget problem rather than a configuration bug. On a 4GB mobile GPU, a 7B model plus file context pushes hardware limits. While Continue's documentation confirms local models like Qwen2.5-Coder 7B are supported, Ollama's documentation explicitly warns that larger context windows increase memory use. Although Ollama can split workloads across GPU and system memory, it favors single-GPU setups for stability; configuration knobs like num_ctx and num_gpu cannot erase the underlying memory requirements for weights and KV cache.

// TAGS
continueollamallminferenceai-codingideself-hostedgpu

DISCOVERED

18d ago

2026-03-25

PUBLISHED

18d ago

2026-03-24

RELEVANCE

7/ 10

AUTHOR

Big-Handle1432