OpenClaw misses oMLX prompt cache
OpenClaw users are reporting zero cached tokens against a local oMLX backend even when the same model caches correctly through direct `/v1/chat/completions` calls and Hermes. The likely culprit is OpenClaw’s request shaping for local proxy routes, not oMLX or the Qwen model itself.
This reads less like a model/server bug and more like an agent-runtime mismatch: OpenClaw appears to be changing the prompt or omitting cache-relevant hints in ways Hermes does not.
- –OpenClaw docs say local `/v1` backends are treated as proxy-style OpenAI-compatible routes and do not get native OpenAI-only shaping, including prompt-cache hints.
- –The user’s config sets `compat.supportsPromptCacheKey: true`, but that only matters if OpenClaw actually forwards the key on the chosen transport path.
- –The earlier 2026.2.15 local-cache regression suggests OpenClaw is still sensitive to small prompt-layout changes that can blow prefix caching on local models.
- –The fastest debug path is to diff the exact request bodies from Hermes vs OpenClaw, especially system prompt ordering, tool schemas, and any `prompt_cache_key` or Responses-specific fields.
DISCOVERED
1h ago
2026-05-11
PUBLISHED
2h ago
2026-05-11
RELEVANCE
AUTHOR
juaps