OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoTUTORIAL
Structured CoT slashes reasoning tokens 22x via grammar
Structured Chain-of-Thought (SCoT) is an inference-time technique that uses GBNF grammars to constrain LLM reasoning blocks. By forcing models to follow a predefined structure during the "thinking" phase, it eliminates verbose overthinking, reduces latency, and significantly improves success rates on complex coding benchmarks.
// ANALYSIS
Constraining the scratchpad proves that structured thought is far more efficient than the verbose, drifting reasoning currently seen in flagship models.
- –SCoT uses guided generation to force thinking tokens into specific fields like GOAL, APPROACH, and VERIFY.
- –Performance on LiveCodeBench v6 jumped from 50% to 64% Pass@1 by preventing models from exhausting their context windows with repetitive reasoning.
- –The technique achieved a 22x token reduction on HumanEval+ and 43x on LiveCodeBench with no fine-tuning required.
- –It is easily implemented at the inference level using tools like Outlines or llama-cpp-python.
// TAGS
llmreasoninggrammarstructured-cotinferenceopen-source
DISCOVERED
4h ago
2026-04-26
PUBLISHED
7h ago
2026-04-26
RELEVANCE
8/ 10
AUTHOR
Thrumpwart