Continue, Ollama hit 7B memory wall
An r/LocalLLaMA user says Continue works fine with a 3B model, but Qwen2.5-Coder 7B crashes with 'llama runner process has terminated: exit status 2' once file context is added. They are asking whether any config.yaml or Ollama setting can keep the workload split between a 4GB GeForce 940MX and 24GB of RAM without blowing up.
This crash appears to be a hard memory budget problem rather than a configuration bug. On a 4GB mobile GPU, a 7B model plus file context pushes hardware limits. While Continue's documentation confirms local models like Qwen2.5-Coder 7B are supported, Ollama's documentation explicitly warns that larger context windows increase memory use. Although Ollama can split workloads across GPU and system memory, it favors single-GPU setups for stability; configuration knobs like num_ctx and num_gpu cannot erase the underlying memory requirements for weights and KV cache.
DISCOVERED
18d ago
2026-03-25
PUBLISHED
18d ago
2026-03-24
RELEVANCE
AUTHOR
Big-Handle1432