Hasty Briefsbeta

Pretraining a LLM with less than $50 budget which outperforms Google BERT

9 days ago
  • #Spiking Neural Networks
  • #LLM
  • #Liquid Time Constants
  • Harish SG, a security researcher, built an LLM named Arthemis with a budget under $50 using a single A100-40GB GPU from Google Colab.
  • Arthemis LLM incorporates Spiking Neural Networks (SNNs) and Liquid Time Constant Neural Networks (LTCs) to enhance efficiency and performance.
  • SNNs mimic biological neural communication with event-driven processing and temporal dynamics, while LTCs adapt time constants dynamically for varied processing speeds.
  • The Arthemis model modifies the LLaMA architecture by replacing standard attention with SpikingAttention and feed-forward networks with LTCFeedForward.
  • Pretraining used 100M tokens from the vesteinn/babylm dataset, achieving coherent sentence generation despite the limited dataset size.
  • Post-training, the model was fine-tuned on the Alpaca dataset to create an instruct model, though some responses were inaccurate due to limited training data.
  • Evaluation showed Arthemis LM outperformed Google BERT in tasks like Hella Swag and Arc-e, despite being trained on significantly fewer tokens and resources.
  • An embedding model was also developed, performing on par with Jina-embeddings-v2-base in the MTEB leaderboard across various tasks.
  • The project demonstrates the potential of bio-inspired neural networks to create efficient LLMs with limited resources, though not suitable for production use.