BACK_TO_FEEDAICRIER_2
Clip to Grok hits 66x speedup
OPEN_SOURCE ↗
REDDIT · REDDIT// 25d agoBENCHMARK RESULT

Clip to Grok hits 66x speedup

Clip to Grok is a repo and PDF describing per-row L2 clipping on decoder weights after every optimizer step. On modular-arithmetic grokking benchmarks, the authors report 18x to 66x faster convergence than an AdamW baseline and zero failures across 300 seeds in the 8-layer setup.

// ANALYSIS

This looks like a genuinely small change with an outsized effect, but it is still a benchmark-bound result rather than proof of a general LLM training breakthrough.

  • The method is strikingly simple: clip weight rows after each optimizer step, with no extra memory and no weight decay requirement.
  • The strongest signal here is stability, not just speed: zero failures across 300 edge-init runs and a much tighter IQR suggest the training dynamics got more predictable.
  • The comparison against Grokfast is useful context, but the work is still confined to modular arithmetic, so real-world transfer remains an open question.
  • Lion appears to benefit the most from the clipping setup, which hints that the optimizer choice may matter as much as the clipping rule itself.
  • The planned 277M LLM test is the key next proof point; until then, treat this as a promising grokking optimization trick, not a solved general recipe.
// TAGS
clip-to-grokbenchmarkresearchopen-sourcellm

DISCOVERED

25d ago

2026-03-18

PUBLISHED

25d ago

2026-03-17

RELEVANCE

9/ 10

AUTHOR

niftylius