RustGPT: A pure-Rust transformer LLM built from scratch

3 hours ago

Copy Link

A complete Large Language Model (LLM) implementation in pure Rust using only ndarray for matrix operations.
Demonstrates building a transformer-based language model from scratch, including pre-training and instruction tuning.
Features interactive chat mode, full backpropagation with gradient clipping, and modular architecture.
Core files include main.rs for training pipeline and llm.rs for LLM implementation.
Transformer architecture components: tokenization, embeddings, transformer blocks, and output projection.
Training phases: pre-training on factual statements and instruction tuning for conversational AI.
Model specifications: dynamic vocabulary, embedding dimension of 128, hidden dimension of 256, and max sequence length of 80 tokens.
Optimizer: Adam with gradient clipping, cross-entropy loss, and learning rates for pre-training and instruction tuning.
Includes comprehensive test coverage for all components and supports running optimized builds.
Future improvements: model persistence, performance optimizations, better sampling, and advanced architectures.
Encourages contributions with beginner, intermediate, and advanced tasks.

Hasty Briefsbeta