The Quadratic Sandwich

3 days ago

Strong convexity guarantees a quadratic lower bound with minimum curvature μ, preventing flat regions and ensuring a unique minimizer.
L-smoothness ensures a quadratic upper bound with maximum curvature L, preventing abrupt changes in gradient and allowing controlled step sizes.
The quadratic sandwich combines both properties, trapping the function between two parabolas, with the condition number κ = L/μ indicating optimization difficulty.
Spectral perspective: strong convexity bounds Hessian eigenvalues from below by μ, L-smoothness bounds them from above by L, and κ reflects eigenvalue spread.
A verification trick reduces checking L-smoothness and strong convexity to testing convexity of modified functions, avoiding eigenvalue computation.
Without strong convexity, gradient descent lacks progress calibration and may have non-unique minima; without L-smoothness, it risks overshooting due to erratic curvature.
Perfect sandwich (κ=1) implies the function is quadratic, allowing gradient descent to converge in one step with optimal step size.
The descent lemma derives from Lipschitz gradients via line integration and Cauchy-Schwarz, linking gradient smoothness to quadratic upper bounds.

Hasty Briefsbeta