Hasty Briefsbeta

Bilingual

Different Language Models Learn Similar Number Representations

3 hours ago
  • #number-representation
  • #feature-learning
  • #language-models
  • Language models learn periodic features representing numbers with dominant periods at 2, 5, and 10.
  • A two-tiered hierarchy exists: all models learn features with period-T spikes, but only some achieve geometric separability for mod-T classification.
  • Fourier domain sparsity is necessary but insufficient for mod-T geometric separability.
  • Data, architecture, optimizer, and tokenizer influence whether models develop geometrically separable features.
  • Two main pathways to acquiring these features: complementary co-occurrence signals in general language data and multi-token addition problems.
  • The study demonstrates convergent evolution in feature learning across diverse models and training signals.