What's the strongest AI model you can train on a laptop in five minutes?

12 days ago

Copy Link

The strongest model trained on a MacBook Pro in five minutes was a ~1.8M-parameter GPT-style transformer, achieving ~9.6 perplexity on TinyStories.
Key optimizations included using MPS, avoiding gradient accumulation, and not focusing on math-based optimizations like torch.compile or float16.
TinyStories dataset was chosen for its simplicity and coherence, ideal for small models with limited training time.
Transformer architecture with SwiGLU and 2-3 layers performed best, with learning rates around 0.001 to 0.002.
Model size sweet spot was ~2M parameters; smaller models plateaued, while larger ones couldn't converge in time.
Chinchilla scaling laws roughly applied, suggesting optimal model size aligns with training token count.
Diffusion models (D3PM) failed to produce coherent output, unlike transformers and LSTMs.

Hasty Briefsbeta