BACK_TO_FEEDAICRIER_2
llama.cpp cold-starts A770 servers on Windows 11
OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoINFRASTRUCTURE

llama.cpp cold-starts A770 servers on Windows 11

A Reddit user reports that a llama.cpp server on Windows 11 with an Intel Arc A770 and dual Xeon C612 platform repeatedly unloads the model after idle periods, even with BIOS and Windows power-saving options disabled. When the next API request arrives, the model can take a long time to reload in chunks, suggesting a cold-start path rather than normal steady-state inference. The only reliable workaround they found was a polling script that sends a tiny completion every 30 seconds to keep the server active.

// ANALYSIS

This looks more like an idle-path or driver/runtime issue than a simple display-output problem.

  • The slow “reload in gigabyte increments” points to model state being paged out or reinitialized, not just the GPU entering a light sleep state.
  • Intel Arc on Windows plus an older workstation chipset is a plausible edge-case combination for PCIe power management or driver behavior.
  • The polling workaround is practical, but it is a keepalive hack, not a fix.
  • For operators, the real question is whether llama.cpp is unloading on its own, or whether Windows/driver behavior is forcing the backend to rehydrate VRAM and host memory after idleness.
// TAGS
llama-cppintel arca770windowsgpullm servingapiidlekeepalive

DISCOVERED

3h ago

2026-04-18

PUBLISHED

6h ago

2026-04-18

RELEVANCE

5/ 10

AUTHOR

Turbulent-Attorney65