Language Models Are Injective and Hence Invertible
6 months ago
- #Language Models
- #Machine Learning
- #Invertibility
- Transformer language models are proven to be injective, meaning different inputs cannot map to the same output.
- The paper introduces SipIt, an algorithm that can exactly reconstruct input text from hidden activations in linear time.
- Empirical tests on six state-of-the-art language models confirm no collisions, supporting the injectivity claim.
- The findings have implications for transparency, interpretability, and safe deployment of language models.