Polar Factor Beyond Newton-Schulz – Fast Matrix Inverse Square Root

3 months ago

The article discusses computing the orthonormal polar factor for tall matrices, focusing on methods beyond Newton-Schulz iterations.
Key techniques include minimax polynomials, Jacobi preconditioning, and online certificates for numerical stability.
The Muon optimizer, which is similar to signSGD or Lion with momentum, is highlighted for its empirical success in machine learning.
The goal is to approximate the sign function on singular values of the momentum matrix to compute the polar factor efficiently.
The article emphasizes the need for fast GPU computation, numerical stability in bf16, and certification of singular values close to 1.
A two-phase approach is proposed: Phase 1 involves safe scaling and global minimax steps, while Phase 2 focuses on aggressive local polishing with symmetric-around-1 steps.
The method includes forming a Gram matrix, applying unbiased preconditioning, and using restart blocks for stability in bf16.
Online coefficient selection from precomputed tables is recommended to optimize performance and stability.
The article concludes with a detailed algorithm for computing the polar factor, including certification and optional polishing steps.

Hasty Briefsbeta