OPEN_SOURCE ↗
REDDIT · REDDIT// 4d agoINFRASTRUCTURE
OpenClaw users report llama.cpp requests getting cancelled mid-prompt
A Reddit user reports that OpenClaw cannot reliably chat against a local llama.cpp server when using Gemma 4 and Qwen3.5. The model endpoint responds with HTTP 200, but OpenClaw appears to treat the run as a network failure while llama.cpp logs show the task being cancelled around 31% through prompt processing. The user notes that direct calls to the llama.cpp endpoint and llama.cpp’s own web UI both work, which points to an integration issue rather than a broken model server.
// ANALYSIS
This reads like an OpenClaw-to-llama.cpp compatibility bug, not a model crash.
- –The server is healthy enough to return `200`, so the failure likely sits in request lifecycle handling, timeout behavior, or streaming expectations on the client side.
- –The cancellation happens while prompt ingestion is still in progress, which suggests OpenClaw may be aborting long first-token latency runs before generation starts.
- –The config uses a large `contextWindow` and `reasoning: true`; either could push request size or behavior into a path OpenClaw does not handle well.
- –The post is useful as a reproducible setup report, but it is still anecdotal until someone isolates whether the trigger is the OpenAI-completions adapter, chat template kwargs, or OpenClaw’s timeout logic.
// TAGS
openclawllamacppgemma4qwen35local-llmtroubleshootingapi-integrationself-hosting
DISCOVERED
4d ago
2026-04-08
PUBLISHED
4d ago
2026-04-08
RELEVANCE
8/ 10
AUTHOR
UnderstandingFew2968