A Theory of Generalization in Deep Learning
15 hours ago
- #generalization
- #deep learning theory
- #neural tangent kernel
- Introduces a non-asymptotic theory of generalization in deep learning based on neural tangent kernel partitioning of output space into signal and noise directions.
- Shows minibatch SGD accumulates coherent signal via linear drift while suppressing memorization into slow diffusion, enabling generalization even in full feature-learning regime.
- Explains phenomena like benign overfitting, double descent, implicit bias, and grokking through this theoretical framework.
- Derives an exact population-risk objective from a single training run, measuring noise in the signal channel, and implements it as an SNR preconditioner for Adam.
- Demonstrates practical improvements: accelerates grokking 5x, suppresses memorization in PINNs/neural representations, and enhances DPO fine-tuning with noisy preferences.