BACK_TO_FEEDAICRIER_2
Claude Code goes local with Qwen3.5
OPEN_SOURCE ↗
REDDIT · REDDIT// 7d agoTUTORIAL

Claude Code goes local with Qwen3.5

A Reddit write-up shows how to point Claude Code at a local llama.cpp server running Qwen3.5 27B, disable telemetry, and keep the workflow fully offline. The author reports usable coding quality, working vision support via mmproj, and clear context and compaction limits at 65K tokens.

// ANALYSIS

This is less about swapping models and more about proving the Claude Code workflow can run without Anthropic infrastructure.

  • llama.cpp plus Qwen3.5 27B handled coding, code review, and image understanding well enough to be practical, not just a novelty
  • The main pain point is Claude Code’s own prompt and compaction behavior, not raw model quality
  • Offline use still needs local replacements for web search and other cloud-tied features, or the experience breaks in subtle ways
  • The Strix Halo-specific ROCBLAS/HIPBLASLT setup makes this especially relevant for AMD unified-memory systems, but it is still a tuned setup rather than a turnkey one
// TAGS
claude-codeqwenai-codingcliself-hostedinferencemultimodalllm

DISCOVERED

7d ago

2026-04-05

PUBLISHED

7d ago

2026-04-05

RELEVANCE

8/ 10

AUTHOR

FeiX7