Transformer neural net learns to run Conway's Game of Life just from examples

a year ago

A simplified transformer neural network, SingleAttentionNet, learns to compute Conway’s Game of Life perfectly from examples.
The model uses its attention mechanism to perform 3x3 convolutions, which are essential for counting cell neighbors in the Game of Life.
Training involves minimizing cross-entropy loss between predicted and true next states of randomly generated Life grids.
The model can generalize to grid sizes up to 16x16, with training times varying from minutes to failure depending on hyperparameters.
Replacing the attention layer with a manually computed Neighbour Attention matrix or a 3x3 average pool speeds up learning and improves generalization.
Convergence is detected by achieving perfect predictions over 1024 training batches and successfully running 100 Life games for 100 steps.
The Game of Life rules are based on cell neighbor counts: alive with 3 neighbors, stay alive with 2, otherwise die.

Hasty Briefsbeta