REDDIT · REDDIT// 4h agoBENCHMARK RESULT

Lubuntu Beats Windows 11 in llama.cpp

A Reddit user benchmarked llama.cpp b8929 on an RTX 5080 and i9-14900KF under Windows 11 25H2 and Lubuntu 26.04, comparing prompt processing and token generation across several models. The main story is that Linux delivers a small but consistent generation advantage, while hybrid CPU/GPU prompt evaluation can be more than 2x faster on Lubuntu, suggesting the OS and build/runtime stack matter a lot once the workload spills beyond the GPU.

// ANALYSIS

Hot take: this is less a “Linux is always faster” story and more a reminder that local LLM performance depends heavily on which part of the stack is stressed; once llama.cpp leans on the CPU, Windows looks much worse here.

–Fully GPU-offloaded runs show a steady Linux edge, usually around 4% to 8% in generation and low single to low double digits in prompt processing.
–The biggest gap appears in hybrid runs (`-t 8 -tb 8 -fit on`), where Lubuntu roughly doubles prompt throughput versus Windows on Qwen3.5-35B-A3B and GPT-OSS-120B.
–Gemma-4-E4B-it is the mildest case, which suggests the delta is workload-sensitive rather than a universal platform law.
–The benchmark is useful but not definitive: driver versions, Windows build, compiler/toolchain differences, and prebuilt vs self-compiled binaries could all be contributing.
–For people choosing an OS purely for local inference, the practical conclusion is that Linux seems worth considering if you run mixed CPU/GPU workloads or care about every last bit of throughput.

// TAGS

llama-cppwindows-11lubuntulinuxnvidiartx-5080local-llmbenchmarkcudaai-inference

DISCOVERED

4h ago

2026-04-26

PUBLISHED

7h ago

2026-04-26

RELEVANCE

9/ 10

AUTHOR

Ok_Mine189