BACK_TO_FEEDAICRIER_2
Ollama Inference Outruns Windows on Linux
OPEN_SOURCE ↗
REDDIT · REDDIT// 14d agoBENCHMARK RESULT

Ollama Inference Outruns Windows on Linux

A Reddit user compared Ollama on the same RTX 8000 homelab box under Windows 10 and Ubuntu 22.04 and saw 72% to 118% higher throughput on Linux across two Qwen models. It is an anecdotal but useful reminder that the OS and runtime stack can matter as much as the GPU when local LLMs are pushed hard.

// ANALYSIS

This is the kind of delta that makes OS choice a first-order performance decision, not a preference detail.

  • The +118% swing on Qwen 3 30B A3B is big enough to treat Windows and Linux as materially different deployment targets for Ollama.
  • Similar Windows slowdown reports have shown up in Ollama's own issue tracker, so this does not read like a one-off bad run.
  • The post does not isolate a single culprit, so driver, runtime, scheduler, or power-management differences are all plausible; fair comparisons need the same launch path, drivers, and background load.
  • For homelab users chasing tokens/sec, Linux still looks like the safer default when throughput matters more than convenience.
// TAGS
ollamainferencegpubenchmarkllmopen-sourceself-hosted

DISCOVERED

14d ago

2026-03-29

PUBLISHED

14d ago

2026-03-29

RELEVANCE

8/ 10

AUTHOR

triynizzles1