Claude Code goes local with Qwen3.5
A Reddit write-up shows how to point Claude Code at a local llama.cpp server running Qwen3.5 27B, disable telemetry, and keep the workflow fully offline. The author reports usable coding quality, working vision support via mmproj, and clear context and compaction limits at 65K tokens.
This is less about swapping models and more about proving the Claude Code workflow can run without Anthropic infrastructure.
- –llama.cpp plus Qwen3.5 27B handled coding, code review, and image understanding well enough to be practical, not just a novelty
- –The main pain point is Claude Code’s own prompt and compaction behavior, not raw model quality
- –Offline use still needs local replacements for web search and other cloud-tied features, or the experience breaks in subtle ways
- –The Strix Halo-specific ROCBLAS/HIPBLASLT setup makes this especially relevant for AMD unified-memory systems, but it is still a tuned setup rather than a turnkey one
DISCOVERED
54d ago
2026-04-05
PUBLISHED
54d ago
2026-04-05
RELEVANCE
AUTHOR
FeiX7
