OPEN_SOURCE ↗
REDDIT · REDDIT// 6h agoBENCHMARK RESULT
Qwen3.6-27B codes locally at 50 tok/s
A LocalLLaMA user reports running an Unsloth GGUF quant of Qwen3.6-27B with a 200K context window on an RTX 5090 via llama.cpp, getting roughly 50 tokens/sec and usable coding-agent behavior. The anecdote lines up with Qwen’s positioning of the 27B open-weight model as a smaller, practical agentic coding model with long-context support.
// ANALYSIS
This is not a real benchmark, but it is the kind of field report that matters: local coding models are crossing from novelty into “could actually sit in a workflow.”
- –Qwen’s own model card claims 27B parameters, native 262K context, and strong coding-agent scores, including 77.2 on SWE-bench Verified and 59.3 on Terminal-Bench 2.0.
- –The interesting signal is not raw speed alone; it is that a single high-end consumer GPU can run a long-context quantized coding model at interactive latency.
- –Claude Code and Opus-class systems still have the product polish, tool reliability, and reasoning edge, but open weights are closing the gap on basic planning and repo navigation.
- –The caveat is large: one Reddit test does not prove day-to-day reliability, especially for tool calls, long sessions, and messy production codebases.
// TAGS
qwen3-6-27bqwenllmai-codingopen-weightsself-hostedgpuinference
DISCOVERED
6h ago
2026-04-23
PUBLISHED
8h ago
2026-04-22
RELEVANCE
8/ 10
AUTHOR
Clasyc