REDDIT · REDDIT// 4h agoBENCHMARK RESULT

GBNF tweak slashes Qwen3.6 token churn

A LocalLLaMA user reports that a custom GBNF grammar dramatically reduces reasoning-token churn and wall time for Qwen3.6-35B-A3B and Qwen3.6-27B in llama.cpp. The biggest gains show up on the 35B-A3B model, where puzzle latency and benchmark throughput improve sharply on an RTX 5090 setup.

// ANALYSIS

This reads like a real reminder that output constraints can matter as much as model choice for verbose reasoning workloads. It is also a very setup-specific benchmark, so the gains are interesting but should be reproduced before anyone treats them as general truth.

–The claimed win is largest on Qwen3.6-35B-A3B: puzzle time drops from 2m32s to 12s, and bench time from 33m52s to 11m04s.
–Qwen3.6-27B keeps the same bench score while improving throughput and finishing time, which suggests the grammar is trimming wasted reasoning rather than breaking task quality.
–The post’s core insight is practical: shorter reasoning traces can remove prefill churn on long-horizon coding work, which is exactly where local inference feels most expensive.
–The benchmark is still anecdotal: custom quantizations, a bespoke Rust/Next.js task suite, and one RTX 5090 machine make this a strong lead, not a universal conclusion.

// TAGS

qwen3-6-35b-a3bqwen3-6-27bllama-cppbenchmarkreasoningai-codingllm

DISCOVERED

4h ago

2026-04-27

PUBLISHED

4h ago

2026-04-27

RELEVANCE

8/ 10

AUTHOR

Holiday_Purpose_3166