Polar Factor Beyond Newton-Schulz – Fast Matrix Inverse Square Root
3 days ago
- #matrix computation
- #machine learning
- #numerical methods
- The article discusses computing the orthonormal polar factor for tall matrices, focusing on methods beyond Newton-Schulz iterations.
- Key techniques include minimax polynomials, Jacobi preconditioning, and online certificates for numerical stability.
- The Muon optimizer, which is similar to signSGD or Lion with momentum, is highlighted for its empirical success in machine learning.
- The goal is to approximate the sign function on singular values of the momentum matrix to compute the polar factor efficiently.
- The article emphasizes the need for fast GPU computation, numerical stability in bf16, and certification of singular values close to 1.
- A two-phase approach is proposed: Phase 1 involves safe scaling and global minimax steps, while Phase 2 focuses on aggressive local polishing with symmetric-around-1 steps.
- The method includes forming a Gram matrix, applying unbiased preconditioning, and using restart blocks for stability in bf16.
- Online coefficient selection from precomputed tables is recommended to optimize performance and stability.
- The article concludes with a detailed algorithm for computing the polar factor, including certification and optional polishing steps.