OPEN_SOURCE ↗
REDDIT · REDDIT// 7d agoTUTORIAL
Claude Code goes local with Qwen3.5
A Reddit write-up shows how to point Claude Code at a local llama.cpp server running Qwen3.5 27B, disable telemetry, and keep the workflow fully offline. The author reports usable coding quality, working vision support via mmproj, and clear context and compaction limits at 65K tokens.
// ANALYSIS
This is less about swapping models and more about proving the Claude Code workflow can run without Anthropic infrastructure.
- –llama.cpp plus Qwen3.5 27B handled coding, code review, and image understanding well enough to be practical, not just a novelty
- –The main pain point is Claude Code’s own prompt and compaction behavior, not raw model quality
- –Offline use still needs local replacements for web search and other cloud-tied features, or the experience breaks in subtle ways
- –The Strix Halo-specific ROCBLAS/HIPBLASLT setup makes this especially relevant for AMD unified-memory systems, but it is still a tuned setup rather than a turnkey one
// TAGS
claude-codeqwenai-codingcliself-hostedinferencemultimodalllm
DISCOVERED
7d ago
2026-04-05
PUBLISHED
7d ago
2026-04-05
RELEVANCE
8/ 10
AUTHOR
FeiX7