The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
5 hours ago
- #generalization failure
- #large language models
- #Reversal Curse
- The Reversal Curse describes a failure in auto-regressive large language models (LLMs) where training on "A is B" does not enable them to infer "B is A".
- For example, models trained on a fact like "Valentina Tereshkova was the first woman in space" may not answer "Who was the first woman in space?" correctly.
- The issue persists across different model sizes and families, and is not resolved by data augmentation, though models can deduce the reverse if given in-context.
- Experiments with GPT-3, Llama-1, and ChatGPT (GPT-3.5/GPT-4) show significant performance gaps between forward and reverse questions on both fictitious and real-world data.
- The study highlights a fundamental limitation in LLMs' generalization ability, despite the prevalence of bidirectional patterns in training data.