Reproducing the deep double descent paper
a year ago
- #resnet18
- #machine-learning
- #double-descent
- The author spent time at the Recurse Center to learn machine learning (ML) without prior background.
- Focused on reproducing results from the 'Deep Double Descent' paper to test understanding.
- Double descent refers to model performance improving, then worsening, and improving again with increased model size or training duration.
- Small models (underparameterized) improve with more parameters but can't fully learn the problem.
- At the interpolation threshold, models memorize training data but perform poorly on test data.
- Larger models (overparameterized) can learn underlying features well without overfitting.
- Label noise was introduced to study its effect on double descent.
- The author attempted to reproduce results using ResNet18 on CIFAR-10, adjusting for image size and output categories.
- Training challenges included incorrect label noise application and model adjustments for CIFAR-10.
- Results showed double descent with label noise, matching the paper's findings.
- Larger models initially performed worse but recovered with more training, especially with higher label noise.