Structured CoT slashes reasoning tokens 22x via grammar
Structured Chain-of-Thought (SCoT) is an inference-time technique that uses GBNF grammars to constrain LLM reasoning blocks. By forcing models to follow a predefined structure during the "thinking" phase, it eliminates verbose overthinking, reduces latency, and significantly improves success rates on complex coding benchmarks.
Constraining the scratchpad proves that structured thought is far more efficient than the verbose, drifting reasoning currently seen in flagship models.
- –SCoT uses guided generation to force thinking tokens into specific fields like GOAL, APPROACH, and VERIFY.
- –Performance on LiveCodeBench v6 jumped from 50% to 64% Pass@1 by preventing models from exhausting their context windows with repetitive reasoning.
- –The technique achieved a 22x token reduction on HumanEval+ and 43x on LiveCodeBench with no fine-tuning required.
- –It is easily implemented at the inference level using tools like Outlines or llama-cpp-python.
DISCOVERED
45d ago
2026-04-26
PUBLISHED
45d ago
2026-04-26
RELEVANCE
AUTHOR
Thrumpwart