Hasty Briefsbeta

RustGPT: A pure-Rust transformer LLM built from scratch

5 hours ago
  • #Rust
  • #LLM
  • #Transformer
  • A complete Large Language Model (LLM) implementation in pure Rust using only ndarray for matrix operations.
  • Demonstrates building a transformer-based language model from scratch, including pre-training and instruction tuning.
  • Features interactive chat mode, full backpropagation with gradient clipping, and modular architecture.
  • Core files include main.rs for training pipeline and llm.rs for LLM implementation.
  • Transformer architecture components: tokenization, embeddings, transformer blocks, and output projection.
  • Training phases: pre-training on factual statements and instruction tuning for conversational AI.
  • Model specifications: dynamic vocabulary, embedding dimension of 128, hidden dimension of 256, and max sequence length of 80 tokens.
  • Optimizer: Adam with gradient clipping, cross-entropy loss, and learning rates for pre-training and instruction tuning.
  • Includes comprehensive test coverage for all components and supports running optimized builds.
  • Future improvements: model persistence, performance optimizations, better sampling, and advanced architectures.
  • Encourages contributions with beginner, intermediate, and advanced tasks.