The 90-year-old idea behind JEPA models: Canonical Correlation Analysis

a day ago

JEPA models are conceptually rooted in Canonical Correlation Analysis (CCA), a 90-year-old statistical method for finding common signals between two datasets.
CCA maximizes correlation between two sets of variables through linear transformations, while JEPA extends this by using non-linear neural networks to maximize correlation between different views of the same data.
The mathematical objective of CCA, minimizing the mean squared error between embeddings under whitening constraints, is similar to JEPA's objective, but JEPA lacks such constraints, leading to risks like representational collapse.
Recent advancements, such as SIGReg, address JEPA's collapse issues by enforcing isotropic Gaussian distributions on embeddings, effectively reintroducing constraints similar to CCA's whitening.
The debate over JEPA's originality highlights the importance of proper citations: while LeCun emphasizes practical implementation, Schmidhuber points to earlier ideas like Predictability Maximization.
JEPA can be viewed as an architectural enhancement of CCA, incorporating non-linearity and scalability, but both share the core goal of maximizing correlation in embedding spaces.

Hasty Briefsbeta