Double Descent Demystified: size of smallest non-zero singular value of X

a year ago

Double descent is a phenomenon in machine learning where test error decreases as model parameters increase beyond the number of data points, contrary to classical overfitting theory.
The behavior of test loss in double descent depends on data size, dimensionality, and model parameters.
The paper provides an intuitive explanation of double descent using polynomial regression and linear algebra.
Three interpretable factors are identified that must all be present for double descent to occur.
Double descent is demonstrated on real data with ordinary linear regression and shown to disappear when any of the three factors are removed.
The findings help explain recent observations in nonlinear models regarding superposition and double descent.
Code related to the research is publicly available.

Hasty Briefsbeta