OPEN_SOURCE ↗
REDDIT · REDDIT// 25d agoBENCHMARK RESULT
Clip to Grok hits 66x speedup
Clip to Grok is a repo and PDF describing per-row L2 clipping on decoder weights after every optimizer step. On modular-arithmetic grokking benchmarks, the authors report 18x to 66x faster convergence than an AdamW baseline and zero failures across 300 seeds in the 8-layer setup.
// ANALYSIS
This looks like a genuinely small change with an outsized effect, but it is still a benchmark-bound result rather than proof of a general LLM training breakthrough.
- –The method is strikingly simple: clip weight rows after each optimizer step, with no extra memory and no weight decay requirement.
- –The strongest signal here is stability, not just speed: zero failures across 300 edge-init runs and a much tighter IQR suggest the training dynamics got more predictable.
- –The comparison against Grokfast is useful context, but the work is still confined to modular arithmetic, so real-world transfer remains an open question.
- –Lion appears to benefit the most from the clipping setup, which hints that the optimizer choice may matter as much as the clipping rule itself.
- –The planned 277M LLM test is the key next proof point; until then, treat this as a promising grokking optimization trick, not a solved general recipe.
// TAGS
clip-to-grokbenchmarkresearchopen-sourcellm
DISCOVERED
25d ago
2026-03-18
PUBLISHED
25d ago
2026-03-17
RELEVANCE
9/ 10
AUTHOR
niftylius