Nanochat

15 hours ago

Copy Link

Andrej Karpathy introduces nanochat, a full-stack ChatGPT-style LLM implementation in a single, clean, minimal codebase.
The project includes training, inference, and a web UI, with training costs as low as $100 for a conversational model.
The codebase is around 8,000 lines, mostly Python (PyTorch) with some Rust for the tokenizer.
Training on an 8XH100 NVIDIA node (~$24/hour) for 4 hours (~$100) yields a coherent conversational model.
A 12-hour training run slightly outperforms GPT-2, with the model size at ~561M parameters, suitable for devices like Raspberry Pi.
Training data includes FineWeb-Edu, SmolTalk, MMLU, and GSM8K, followed by supervised finetuning on various datasets.
A web server and vanilla JavaScript frontend are included, with a Hugging Face model available for testing.
A script for running the model on CPU (macOS) is provided, with example usage and output.

Hasty Briefsbeta