OPEN_SOURCE ↗
REDDIT · REDDIT// 14d agoBENCHMARK RESULT
Ollama Inference Outruns Windows on Linux
A Reddit user compared Ollama on the same RTX 8000 homelab box under Windows 10 and Ubuntu 22.04 and saw 72% to 118% higher throughput on Linux across two Qwen models. It is an anecdotal but useful reminder that the OS and runtime stack can matter as much as the GPU when local LLMs are pushed hard.
// ANALYSIS
This is the kind of delta that makes OS choice a first-order performance decision, not a preference detail.
- –The +118% swing on Qwen 3 30B A3B is big enough to treat Windows and Linux as materially different deployment targets for Ollama.
- –Similar Windows slowdown reports have shown up in Ollama's own issue tracker, so this does not read like a one-off bad run.
- –The post does not isolate a single culprit, so driver, runtime, scheduler, or power-management differences are all plausible; fair comparisons need the same launch path, drivers, and background load.
- –For homelab users chasing tokens/sec, Linux still looks like the safer default when throughput matters more than convenience.
// TAGS
ollamainferencegpubenchmarkllmopen-sourceself-hosted
DISCOVERED
14d ago
2026-03-29
PUBLISHED
14d ago
2026-03-29
RELEVANCE
8/ 10
AUTHOR
triynizzles1