Language Models Are Injective and Hence Invertible

7 months ago

Transformer language models are proven to be injective, meaning different inputs cannot map to the same output.
The paper introduces SipIt, an algorithm that can exactly reconstruct input text from hidden activations in linear time.
Empirical tests on six state-of-the-art language models confirm no collisions, supporting the injectivity claim.
The findings have implications for transparency, interpretability, and safe deployment of language models.

Hasty Briefsbeta