Reproducing the deep double descent paper

a year ago

The author spent time at the Recurse Center to learn machine learning (ML) without prior background.
Focused on reproducing results from the 'Deep Double Descent' paper to test understanding.
Double descent refers to model performance improving, then worsening, and improving again with increased model size or training duration.
Small models (underparameterized) improve with more parameters but can't fully learn the problem.
At the interpolation threshold, models memorize training data but perform poorly on test data.
Larger models (overparameterized) can learn underlying features well without overfitting.
Label noise was introduced to study its effect on double descent.
The author attempted to reproduce results using ResNet18 on CIFAR-10, adjusting for image size and output categories.
Training challenges included incorrect label noise application and model adjustments for CIFAR-10.
Results showed double descent with label noise, matching the paper's findings.
Larger models initially performed worse but recovered with more training, especially with higher label noise.

Hasty Briefsbeta