Pretraining a LLM with less than $50 budget which outperforms Google BERT

9 days ago

Copy Link

Harish SG, a security researcher, built an LLM named Arthemis with a budget under $50 using a single A100-40GB GPU from Google Colab.
Arthemis LLM incorporates Spiking Neural Networks (SNNs) and Liquid Time Constant Neural Networks (LTCs) to enhance efficiency and performance.
SNNs mimic biological neural communication with event-driven processing and temporal dynamics, while LTCs adapt time constants dynamically for varied processing speeds.
The Arthemis model modifies the LLaMA architecture by replacing standard attention with SpikingAttention and feed-forward networks with LTCFeedForward.
Pretraining used 100M tokens from the vesteinn/babylm dataset, achieving coherent sentence generation despite the limited dataset size.
Post-training, the model was fine-tuned on the Alpaca dataset to create an instruct model, though some responses were inaccurate due to limited training data.
Evaluation showed Arthemis LM outperformed Google BERT in tasks like Hella Swag and Arc-e, despite being trained on significantly fewer tokens and resources.
An embedding model was also developed, performing on par with Jina-embeddings-v2-base in the MTEB leaderboard across various tasks.
The project demonstrates the potential of bio-inspired neural networks to create efficient LLMs with limited resources, though not suitable for production use.

Hasty Briefsbeta