Gram Newton-Schulz accelerates Muon optimizer

// 102d agoRESEARCH PAPER

Gram Newton-Schulz accelerates Muon optimizer

Gram Newton-Schulz is a hardware-aware optimization of the standard Newton-Schulz algorithm, designed to remove the orthogonalization bottleneck in the Muon optimizer. By shifting computations to small symmetric Gram matrices and leveraging custom GPU kernels, it delivers 40–50% faster orthogonalization and significant FLOP savings without sacrificing model quality.

// ANALYSIS

This implementation provides a "free lunch" performance boost for large-scale LLM training by shifting work from large rectangular matrices to small square Gram matrices, reducing FLOPs by up to 58%. Custom CUDA kernels optimized for NVIDIA Hopper and Blackwell architectures exploit symmetry to deliver 2x speedups over standard cuBLAS routines. The approach uses periodic restarts to mitigate numerical instability, making it a stable and strictly superior drop-in replacement for the Muon optimizer's bottleneck step.

// TAGS

llmoptimizergpuresearchmuongram-newton-schulzmlops

DISCOVERED

102d ago

2026-03-31

PUBLISHED

102d ago

2026-03-31

RELEVANCE

9/ 10

AUTHOR

Benlus

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE16m ago

C# PS5 emulator SharpEmu boots 2D games

SharpEmu is an experimental, open-source PlayStation 5 emulator written in C# that targets Windows, Linux, and macOS. In its early development stages, the project has successfully booted simple 2D games like Dreaming Sarah and shown initial progress loading complex titles such as Demon's Souls Remake.

OPEN SOURCE18m ago

background-agents launches multi-repo coding agents

background-agents is an open-source platform for running autonomous coding agents asynchronously in cloud sandboxes. Built on Cloudflare, Modal, and Daytona, the system enables agents to perform long-running tasks like security audits and migrations across multiple repositories.

OPEN SOURCE18m ago

FlClash is a multi-platform proxy client based on ClashMeta, offering a simple, open-source, and ad-free interface.

FlClash is an open-source, multi-platform GUI proxy client built on ClashMeta. Developed using Dart and Flutter, it offers a unified, ad-free interface for managing network proxy settings across Android, iOS, Windows, macOS, and Linux. The application aims to provide a user-friendly way to configure and run ClashMeta-based rule routing.