Universal pre-training by iterated random computation

a year ago

Explores using randomly generated data for model pre-training.
Theoretical justification based on algorithmic complexity and Solomonoff induction.
Empirical evidence shows synthetic data pre-training enables zero-shot learning.
Performance improves with model scale and extends to real-world data.
Finetuning post pre-training enhances convergence and generalization.

Hasty Briefsbeta