The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

5 hours ago

The Reversal Curse describes a failure in auto-regressive large language models (LLMs) where training on "A is B" does not enable them to infer "B is A".
For example, models trained on a fact like "Valentina Tereshkova was the first woman in space" may not answer "Who was the first woman in space?" correctly.
The issue persists across different model sizes and families, and is not resolved by data augmentation, though models can deduce the reverse if given in-context.
Experiments with GPT-3, Llama-1, and ChatGPT (GPT-3.5/GPT-4) show significant performance gaps between forward and reverse questions on both fictitious and real-world data.
The study highlights a fundamental limitation in LLMs' generalization ability, despite the prevalence of bidirectional patterns in training data.

Hasty Briefsbeta