Different Language Models Learn Similar Number Representations

a month ago

Language models learn periodic features representing numbers with dominant periods at 2, 5, and 10.
A two-tiered hierarchy exists: all models learn features with period-T spikes, but only some achieve geometric separability for mod-T classification.
Fourier domain sparsity is necessary but insufficient for mod-T geometric separability.
Data, architecture, optimizer, and tokenizer influence whether models develop geometrically separable features.
Two main pathways to acquiring these features: complementary co-occurrence signals in general language data and multi-token addition problems.
The study demonstrates convergent evolution in feature learning across diverse models and training signals.

Hasty Briefsbeta