Hasty Briefsbeta

What's the strongest AI model you can train on a laptop in five minutes?

12 days ago
  • #small-models
  • #AI-training
  • #transformer
  • The strongest model trained on a MacBook Pro in five minutes was a ~1.8M-parameter GPT-style transformer, achieving ~9.6 perplexity on TinyStories.
  • Key optimizations included using MPS, avoiding gradient accumulation, and not focusing on math-based optimizations like torch.compile or float16.
  • TinyStories dataset was chosen for its simplicity and coherence, ideal for small models with limited training time.
  • Transformer architecture with SwiGLU and 2-3 layers performed best, with learning rates around 0.001 to 0.002.
  • Model size sweet spot was ~2M parameters; smaller models plateaued, while larger ones couldn't converge in time.
  • Chinchilla scaling laws roughly applied, suggesting optimal model size aligns with training token count.
  • Diffusion models (D3PM) failed to produce coherent output, unlike transformers and LSTMs.