Hasty Briefsbeta

Bilingual

A Theory of Generalization in Deep Learning

15 hours ago
  • #generalization
  • #deep learning theory
  • #neural tangent kernel
  • Introduces a non-asymptotic theory of generalization in deep learning based on neural tangent kernel partitioning of output space into signal and noise directions.
  • Shows minibatch SGD accumulates coherent signal via linear drift while suppressing memorization into slow diffusion, enabling generalization even in full feature-learning regime.
  • Explains phenomena like benign overfitting, double descent, implicit bias, and grokking through this theoretical framework.
  • Derives an exact population-risk objective from a single training run, measuring noise in the signal channel, and implements it as an SNR preconditioner for Adam.
  • Demonstrates practical improvements: accelerates grokking 5x, suppresses memorization in PINNs/neural representations, and enhances DPO fine-tuning with noisy preferences.