RustGPT: A pure-Rust transformer LLM built from scratch
3 hours ago
- #Rust
- #LLM
- #Transformer
- A complete Large Language Model (LLM) implementation in pure Rust using only ndarray for matrix operations.
- Demonstrates building a transformer-based language model from scratch, including pre-training and instruction tuning.
- Features interactive chat mode, full backpropagation with gradient clipping, and modular architecture.
- Core files include main.rs for training pipeline and llm.rs for LLM implementation.
- Transformer architecture components: tokenization, embeddings, transformer blocks, and output projection.
- Training phases: pre-training on factual statements and instruction tuning for conversational AI.
- Model specifications: dynamic vocabulary, embedding dimension of 128, hidden dimension of 256, and max sequence length of 80 tokens.
- Optimizer: Adam with gradient clipping, cross-entropy loss, and learning rates for pre-training and instruction tuning.
- Includes comprehensive test coverage for all components and supports running optimized builds.
- Future improvements: model persistence, performance optimizations, better sampling, and advanced architectures.
- Encourages contributions with beginner, intermediate, and advanced tasks.