OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoINFRASTRUCTURE
llama.cpp cold-starts A770 servers on Windows 11
A Reddit user reports that a llama.cpp server on Windows 11 with an Intel Arc A770 and dual Xeon C612 platform repeatedly unloads the model after idle periods, even with BIOS and Windows power-saving options disabled. When the next API request arrives, the model can take a long time to reload in chunks, suggesting a cold-start path rather than normal steady-state inference. The only reliable workaround they found was a polling script that sends a tiny completion every 30 seconds to keep the server active.
// ANALYSIS
This looks more like an idle-path or driver/runtime issue than a simple display-output problem.
- –The slow “reload in gigabyte increments” points to model state being paged out or reinitialized, not just the GPU entering a light sleep state.
- –Intel Arc on Windows plus an older workstation chipset is a plausible edge-case combination for PCIe power management or driver behavior.
- –The polling workaround is practical, but it is a keepalive hack, not a fix.
- –For operators, the real question is whether llama.cpp is unloading on its own, or whether Windows/driver behavior is forcing the backend to rehydrate VRAM and host memory after idleness.
// TAGS
llama-cppintel arca770windowsgpullm servingapiidlekeepalive
DISCOVERED
3h ago
2026-04-18
PUBLISHED
6h ago
2026-04-18
RELEVANCE
5/ 10
AUTHOR
Turbulent-Attorney65