Learning Pseudorandom Numbers with Transformers

6 hours ago

Transformers can learn and predict sequences from complex pseudo-random number generators (PCGs), even beyond published classical attacks.
Models can jointly learn multiple distinct PRNGs during training and identify structures from different permutations.
A scaling law shows that the number of in-context elements needed for near-perfect prediction grows as the square root of the modulus (√m).
Curriculum learning is critical for larger moduli (m ≥ 2^20), requiring training data from smaller moduli to overcome optimization stagnation.
Embedding analysis reveals a novel clustering phenomenon where top principal components group integer inputs into bitwise rotationally-invariant clusters, aiding representation transfer.

Hasty Briefsbeta