Qwen3.6-35B-A3B coding hits 32GB RAM wall
A developer report on running the Qwen3.6-35B-A3B MoE model for local agentic coding on a 32GB Mac reveals critical context management hurdles. While the model shows frontier-level reasoning, the 32k token context limit imposed by hardware constraints leads to reasoning failure during complex repository-wide tasks.
Local LLMs are reaching frontier performance, but 32GB of RAM is becoming the new bottleneck for real-world agentic workflows.
- –Qwen 3.6-35B excels in benchmarks but struggles with context compaction in local loops like OpenCode and Claude Code.
- –32k context is insufficient for "rooting around" non-trivial codebases, leading to hallucinated file paths and loss of task state.
- –Disabling subagents provides a temporary memory reprieve but fails as the reasoning chain extends beyond the second compaction pass.
- –The failure highlights a growing gap between model "thinking" capabilities and the memory overhead required for persistent local agency.
DISCOVERED
45d ago
2026-04-20
PUBLISHED
45d ago
2026-04-19
RELEVANCE
AUTHOR
boutell