Microgpt

3 months ago

microgpt is a minimalistic GPT implementation in 200 lines of Python with no dependencies.
The project includes dataset handling, tokenizer, autograd engine, GPT-2-like architecture, Adam optimizer, and training/inference loops.
Dataset consists of 32,000 names, each treated as a document, with the model learning to generate similar names.
Tokenizer converts characters to integer IDs, including a special BOS (Beginning of Sequence) token.
Autograd is implemented from scratch with a Value class for gradient computation via backpropagation.
Model architecture features embeddings, multi-head attention, MLP blocks, and residual connections.
Training loop processes documents token by token, computes loss, backpropagates gradients, and updates parameters using Adam.
Inference generates new names by sampling from the model's output distribution with temperature control.
The script runs in about a minute on a MacBook, demonstrating the core algorithm of LLMs without scalability optimizations.
Key differences from production models include dataset size, tokenizer complexity, tensor operations, and post-training fine-tuning.

Hasty Briefsbeta