AutoMuon drops as drop-in AdamW replacement

// 90d agoOPENSOURCE RELEASE

AutoMuon drops as drop-in AdamW replacement

AutoMuon automates the integration of the Muon optimizer into PyTorch training pipelines, acting as a one-line replacement for AdamW. It automatically routes 2D projection weights to Muon for orthogonalized updates while keeping embeddings, norms, and biases on AdamW, eliminating the manual parameter grouping previously required to leverage Muon's training speedups.

// ANALYSIS

AutoMuon democratizes the high-performance Muon optimizer by removing the implementation friction that previously limited its use to specialized "speedrun" repositories.

–Automates the complex parameter routing required to apply Muon’s orthogonal updates safely
–Reaches AdamW baseline accuracy in significantly fewer epochs while achieving higher final accuracy on benchmarks
–Native support for DistributedDataParallel (DDP) and standard PyTorch schedulers ensures production readiness
–Conservative scanning logic provides a safe fallback to AdamW for ambiguous or custom architectural components
–Massive potential for reducing compute costs in large-scale transformer and CNN training workloads

// TAGS

automuonopen-sourcemlopsllmbenchmarkdevtool

DISCOVERED

90d ago

2026-04-26

PUBLISHED

90d ago

2026-04-26

RELEVANCE

8/ 10

AUTHOR

Skye7821

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

VIDEO1h ago

Lower reasoning effort boosts Claude Opus 5 performance

In a video evaluation by Every, testing shows that Anthropic's Claude Opus 5 performs significantly better when configured with medium or low reasoning effort rather than maximum thinking settings. While max reasoning is designed for heavy problem-solving, it frequently causes the model to overthink, over-complicate solutions, and introduce unnecessary errors.

VIDEO1h ago

Claude Opus 5 Lags Rivals in Developer Workflows

In a hands-on review by Every, Anthropic's high-capability Claude Opus 5 model is put to the test across real-world daily coding and autonomous developer workflows. Despite its advanced reasoning metrics and position as a frontier model, the analysis highlights practical friction points—including latency and cost-benefit trade-offs—that prevent it from displacing current daily drivers like GPT-5.6 and Claude Fable in active developer setups.

UPDATE3h ago

Softr adds visual co-building and vibe coding

Softr has introduced visual co-building alongside customizable vibe-coded blocks, pairing prompt-based AI generation with direct visual editing. The platform allows users to rapidly generate, adjust, and deploy custom business portals, CRMs, and internal tools, bridging the gap between natural language prompt creation and precise interface design.