Out-of-Distribution Generalization in Transformers via Latent Space Reasoning
5 days ago
- #OOD Generalization
- #Transformers
- #Machine Learning
- Investigates out-of-distribution (OOD) generalization in Transformer networks using modular arithmetic on computational graphs.
- Introduces four architectural mechanisms to enhance OOD generalization: input-adaptive recurrence, algorithmic supervision, anchored latent representations, and an explicit error-correction mechanism.
- Presents empirical results and mechanistic interpretability analysis to show how these mechanisms enable robust OOD generalization.
- Focuses on systematic and compositional generalization beyond training distribution, a critical challenge for modern language models.