The Quadratic Sandwich
3 days ago
- #gradient-descent
- #optimization
- #convex-analysis
- Strong convexity guarantees a quadratic lower bound with minimum curvature μ, preventing flat regions and ensuring a unique minimizer.
- L-smoothness ensures a quadratic upper bound with maximum curvature L, preventing abrupt changes in gradient and allowing controlled step sizes.
- The quadratic sandwich combines both properties, trapping the function between two parabolas, with the condition number κ = L/μ indicating optimization difficulty.
- Spectral perspective: strong convexity bounds Hessian eigenvalues from below by μ, L-smoothness bounds them from above by L, and κ reflects eigenvalue spread.
- A verification trick reduces checking L-smoothness and strong convexity to testing convexity of modified functions, avoiding eigenvalue computation.
- Without strong convexity, gradient descent lacks progress calibration and may have non-unique minima; without L-smoothness, it risks overshooting due to erratic curvature.
- Perfect sandwich (κ=1) implies the function is quadratic, allowing gradient descent to converge in one step with optimal step size.
- The descent lemma derives from Lipschitz gradients via line integration and Cauchy-Schwarz, linking gradient smoothness to quadratic upper bounds.