Hasty Briefsbeta

Bilingual

Polar Factor Beyond Newton-Schulz – Fast Matrix Inverse Square Root

3 days ago
  • #matrix computation
  • #machine learning
  • #numerical methods
  • The article discusses computing the orthonormal polar factor for tall matrices, focusing on methods beyond Newton-Schulz iterations.
  • Key techniques include minimax polynomials, Jacobi preconditioning, and online certificates for numerical stability.
  • The Muon optimizer, which is similar to signSGD or Lion with momentum, is highlighted for its empirical success in machine learning.
  • The goal is to approximate the sign function on singular values of the momentum matrix to compute the polar factor efficiently.
  • The article emphasizes the need for fast GPU computation, numerical stability in bf16, and certification of singular values close to 1.
  • A two-phase approach is proposed: Phase 1 involves safe scaling and global minimax steps, while Phase 2 focuses on aggressive local polishing with symmetric-around-1 steps.
  • The method includes forming a Gram matrix, applying unbiased preconditioning, and using restart blocks for stability in bf16.
  • Online coefficient selection from precomputed tables is recommended to optimize performance and stability.
  • The article concludes with a detailed algorithm for computing the polar factor, including certification and optional polishing steps.