REDDIT · REDDIT// 6h agoBENCHMARK RESULT

Qwen3.6 mods boost local coding

A Reddit user reports running a modified Qwen3.6-35B-A3B setup on an NVIDIA A40 with llama-server, 1M-token context, persistent session memory, and an expanded Qwen Code toolset. The post is anecdotal, but it lines up with Qwen3.6’s official positioning as an open sparse MoE model for agentic coding with 35B total parameters, 3B active, and long-context support.

// ANALYSIS

This is less a formal benchmark than a useful signal: Qwen3.6-35B-A3B is becoming a serious local coding-agent substrate for people willing to tune the stack.

–Reported 82-106 tok/s on an A40 is notable for a local 35B-class MoE coding workflow, though the exact quantization and workload are unspecified.
–The interesting part is the system design: llama-server for long context, OpenViking-style memory across sessions, and qwen.md-style project guidance.
–Expanding Qwen Code from a small default tool surface to 71 tools points toward a local alternative to Claude Code-style agent loops.
–Treat this as community experimentation, not a reproducible leaderboard result, since the Reddit post has no comments, no shared config, and no validation harness.

// TAGS

qwen3-6-35b-a3bqwen-codellmai-codinginferencegpuopen-weightsbenchmark

DISCOVERED

6h ago

2026-04-23

PUBLISHED

9h ago

2026-04-22

RELEVANCE

8/ 10

AUTHOR

Purpose-Effective