OPEN_SOURCE ↗
REDDIT · REDDIT// 24d agoOPENSOURCE RELEASE
PyTorch 2.9 ships Muon optimizer
PyTorch 2.9 adds `torch.optim.Muon`, a specialized optimizer for 2D hidden-layer weights while embeddings, biases, and output heads stay on AdamW (https://docs.pytorch.org/docs/stable/generated/torch.optim.Muon.html). In the current docs it is still a for-loop optimizer with no foreach or fused path, so the immediate win looks more experimental than plug-and-play.
// ANALYSIS
Muon is interesting, but it reads like an optimizer you benchmark carefully, not one you casually flip on for every fine-tune.
- –PyTorch’s docs say Muon is meant for 2D hidden-layer parameters; non-2D params still belong on AdamW, so parameter grouping is the first real hurdle.
- –The 2.9 optimizer table lists Muon as `for-loop` only, with no foreach or fused implementation, which suggests its gains are algorithmic rather than kernel-level: https://docs.pytorch.org/docs/2.9/optim.html
- –For VRAM-constrained training, the appeal is optimizer-state efficiency, not a drop-in replacement for AdamW.
- –The Reddit thread is still empty, which fits the current vibe: curious, promising, but not yet battle-tested in the local fine-tuning crowd: https://www.reddit.com/r/LocalLLaMA/comments/1rxe7jl/torchoptimmuon_is_now_in_pytorch_29_anyone/
// TAGS
muonpytorchopen-sourcefine-tuninggpuresearch
DISCOVERED
24d ago
2026-03-18
PUBLISHED
24d ago
2026-03-18
RELEVANCE
8/ 10
AUTHOR
Sensitive-Two9732