Hasty Briefsbeta

Bilingual

The Quadratic Sandwich

3 days ago
  • #gradient-descent
  • #optimization
  • #convex-analysis
  • Strong convexity guarantees a quadratic lower bound with minimum curvature μ, preventing flat regions and ensuring a unique minimizer.
  • L-smoothness ensures a quadratic upper bound with maximum curvature L, preventing abrupt changes in gradient and allowing controlled step sizes.
  • The quadratic sandwich combines both properties, trapping the function between two parabolas, with the condition number κ = L/μ indicating optimization difficulty.
  • Spectral perspective: strong convexity bounds Hessian eigenvalues from below by μ, L-smoothness bounds them from above by L, and κ reflects eigenvalue spread.
  • A verification trick reduces checking L-smoothness and strong convexity to testing convexity of modified functions, avoiding eigenvalue computation.
  • Without strong convexity, gradient descent lacks progress calibration and may have non-unique minima; without L-smoothness, it risks overshooting due to erratic curvature.
  • Perfect sandwich (κ=1) implies the function is quadratic, allowing gradient descent to converge in one step with optimal step size.
  • The descent lemma derives from Lipschitz gradients via line integration and Cauchy-Schwarz, linking gradient smoothness to quadratic upper bounds.