Nanochat
15 hours ago
- #AI
- #LLM
- #ChatGPT
- Andrej Karpathy introduces nanochat, a full-stack ChatGPT-style LLM implementation in a single, clean, minimal codebase.
- The project includes training, inference, and a web UI, with training costs as low as $100 for a conversational model.
- The codebase is around 8,000 lines, mostly Python (PyTorch) with some Rust for the tokenizer.
- Training on an 8XH100 NVIDIA node (~$24/hour) for 4 hours (~$100) yields a coherent conversational model.
- A 12-hour training run slightly outperforms GPT-2, with the model size at ~561M parameters, suitable for devices like Raspberry Pi.
- Training data includes FineWeb-Edu, SmolTalk, MMLU, and GSM8K, followed by supervised finetuning on various datasets.
- A web server and vanilla JavaScript frontend are included, with a Hugging Face model available for testing.
- A script for running the model on CPU (macOS) is provided, with example usage and output.