What's the strongest AI model you can train on a laptop in five minutes?
12 days ago
- #small-models
- #AI-training
- #transformer
- The strongest model trained on a MacBook Pro in five minutes was a ~1.8M-parameter GPT-style transformer, achieving ~9.6 perplexity on TinyStories.
- Key optimizations included using MPS, avoiding gradient accumulation, and not focusing on math-based optimizations like torch.compile or float16.
- TinyStories dataset was chosen for its simplicity and coherence, ideal for small models with limited training time.
- Transformer architecture with SwiGLU and 2-3 layers performed best, with learning rates around 0.001 to 0.002.
- Model size sweet spot was ~2M parameters; smaller models plateaued, while larger ones couldn't converge in time.
- Chinchilla scaling laws roughly applied, suggesting optimal model size aligns with training token count.
- Diffusion models (D3PM) failed to produce coherent output, unlike transformers and LSTMs.