Hasty Briefsbeta

  • #AI
  • #LLM
  • #ChatGPT
  • Andrej Karpathy introduces nanochat, a full-stack ChatGPT-style LLM implementation in a single, clean, minimal codebase.
  • The project includes training, inference, and a web UI, with training costs as low as $100 for a conversational model.
  • The codebase is around 8,000 lines, mostly Python (PyTorch) with some Rust for the tokenizer.
  • Training on an 8XH100 NVIDIA node (~$24/hour) for 4 hours (~$100) yields a coherent conversational model.
  • A 12-hour training run slightly outperforms GPT-2, with the model size at ~561M parameters, suitable for devices like Raspberry Pi.
  • Training data includes FineWeb-Edu, SmolTalk, MMLU, and GSM8K, followed by supervised finetuning on various datasets.
  • A web server and vanilla JavaScript frontend are included, with a Hugging Face model available for testing.
  • A script for running the model on CPU (macOS) is provided, with example usage and output.