Microgpt
12 days ago
- #python
- #machine-learning
- #gpt
- microgpt is a minimalistic GPT implementation in 200 lines of Python with no dependencies.
- The project includes dataset handling, tokenizer, autograd engine, GPT-2-like architecture, Adam optimizer, and training/inference loops.
- Dataset consists of 32,000 names, each treated as a document, with the model learning to generate similar names.
- Tokenizer converts characters to integer IDs, including a special BOS (Beginning of Sequence) token.
- Autograd is implemented from scratch with a Value class for gradient computation via backpropagation.
- Model architecture features embeddings, multi-head attention, MLP blocks, and residual connections.
- Training loop processes documents token by token, computes loss, backpropagates gradients, and updates parameters using Adam.
- Inference generates new names by sampling from the model's output distribution with temperature control.
- The script runs in about a minute on a MacBook, demonstrating the core algorithm of LLMs without scalability optimizations.
- Key differences from production models include dataset size, tokenizer complexity, tensor operations, and post-training fine-tuning.