BACK_TO_FEEDAICRIER_2
PyTorch 2.9 ships Muon optimizer
OPEN_SOURCE ↗
REDDIT · REDDIT// 24d agoOPENSOURCE RELEASE

PyTorch 2.9 ships Muon optimizer

PyTorch 2.9 adds `torch.optim.Muon`, a specialized optimizer for 2D hidden-layer weights while embeddings, biases, and output heads stay on AdamW (https://docs.pytorch.org/docs/stable/generated/torch.optim.Muon.html). In the current docs it is still a for-loop optimizer with no foreach or fused path, so the immediate win looks more experimental than plug-and-play.

// ANALYSIS

Muon is interesting, but it reads like an optimizer you benchmark carefully, not one you casually flip on for every fine-tune.

  • PyTorch’s docs say Muon is meant for 2D hidden-layer parameters; non-2D params still belong on AdamW, so parameter grouping is the first real hurdle.
  • The 2.9 optimizer table lists Muon as `for-loop` only, with no foreach or fused implementation, which suggests its gains are algorithmic rather than kernel-level: https://docs.pytorch.org/docs/2.9/optim.html
  • For VRAM-constrained training, the appeal is optimizer-state efficiency, not a drop-in replacement for AdamW.
  • The Reddit thread is still empty, which fits the current vibe: curious, promising, but not yet battle-tested in the local fine-tuning crowd: https://www.reddit.com/r/LocalLLaMA/comments/1rxe7jl/torchoptimmuon_is_now_in_pytorch_29_anyone/
// TAGS
muonpytorchopen-sourcefine-tuninggpuresearch

DISCOVERED

24d ago

2026-03-18

PUBLISHED

24d ago

2026-03-18

RELEVANCE

8/ 10

AUTHOR

Sensitive-Two9732