Pretraining a LLM with less than $50 budget which outperforms Google BERT
9 days ago
- #Spiking Neural Networks
- #LLM
- #Liquid Time Constants
- Harish SG, a security researcher, built an LLM named Arthemis with a budget under $50 using a single A100-40GB GPU from Google Colab.
- Arthemis LLM incorporates Spiking Neural Networks (SNNs) and Liquid Time Constant Neural Networks (LTCs) to enhance efficiency and performance.
- SNNs mimic biological neural communication with event-driven processing and temporal dynamics, while LTCs adapt time constants dynamically for varied processing speeds.
- The Arthemis model modifies the LLaMA architecture by replacing standard attention with SpikingAttention and feed-forward networks with LTCFeedForward.
- Pretraining used 100M tokens from the vesteinn/babylm dataset, achieving coherent sentence generation despite the limited dataset size.
- Post-training, the model was fine-tuned on the Alpaca dataset to create an instruct model, though some responses were inaccurate due to limited training data.
- Evaluation showed Arthemis LM outperformed Google BERT in tasks like Hella Swag and Arc-e, despite being trained on significantly fewer tokens and resources.
- An embedding model was also developed, performing on par with Jina-embeddings-v2-base in the MTEB leaderboard across various tasks.
- The project demonstrates the potential of bio-inspired neural networks to create efficient LLMs with limited resources, though not suitable for production use.