OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoBENCHMARK RESULT
Lubuntu Beats Windows 11 in llama.cpp
A Reddit user benchmarked llama.cpp b8929 on an RTX 5080 and i9-14900KF under Windows 11 25H2 and Lubuntu 26.04, comparing prompt processing and token generation across several models. The main story is that Linux delivers a small but consistent generation advantage, while hybrid CPU/GPU prompt evaluation can be more than 2x faster on Lubuntu, suggesting the OS and build/runtime stack matter a lot once the workload spills beyond the GPU.
// ANALYSIS
Hot take: this is less a “Linux is always faster” story and more a reminder that local LLM performance depends heavily on which part of the stack is stressed; once llama.cpp leans on the CPU, Windows looks much worse here.
- –Fully GPU-offloaded runs show a steady Linux edge, usually around 4% to 8% in generation and low single to low double digits in prompt processing.
- –The biggest gap appears in hybrid runs (`-t 8 -tb 8 -fit on`), where Lubuntu roughly doubles prompt throughput versus Windows on Qwen3.5-35B-A3B and GPT-OSS-120B.
- –Gemma-4-E4B-it is the mildest case, which suggests the delta is workload-sensitive rather than a universal platform law.
- –The benchmark is useful but not definitive: driver versions, Windows build, compiler/toolchain differences, and prebuilt vs self-compiled binaries could all be contributing.
- –For people choosing an OS purely for local inference, the practical conclusion is that Linux seems worth considering if you run mixed CPU/GPU workloads or care about every last bit of throughput.
// TAGS
llama-cppwindows-11lubuntulinuxnvidiartx-5080local-llmbenchmarkcudaai-inference
DISCOVERED
4h ago
2026-04-26
PUBLISHED
7h ago
2026-04-26
RELEVANCE
9/ 10
AUTHOR
Ok_Mine189