You can now train a 70B language model at home

10 months ago

Answer.AI releases a fully open-source system to train 70b large language models on desktop computers with gaming GPUs (RTX 3090 or 4090).
The system combines FSDP (Fully Sharded Data Parallel) and QLoRA (Quantized Low-Rank Adaptation) to enable efficient training on consumer hardware.
QLoRA allows training by quantizing model weights to 4 bits and using LoRA adapters, reducing memory usage while maintaining performance.
FSDP enables parallel training across multiple GPUs by sharding the model, avoiding the inefficiency of sequential processing.
The project aims to democratize AI by making large model training accessible without expensive data center hardware.
Key collaborators include Tim Dettmers, Hugging Face, and Answer.AI, leveraging open-source tools like bitsandbytes, PEFT, and Transformers.
The system supports techniques like gradient checkpointing, CPU offloading, and Flash Attention 2 to optimize memory and performance.
HQQ (Half-Quadratic Quantization) is introduced as an alternative to bitsandbytes, offering faster and more accurate quantization.
Practical steps for using FSDP/QLoRA are provided, including installation and running training scripts on multi-GPU setups.
The project is a first step toward more accessible AI model training, with future improvements and community contributions expected.

Hasty Briefsbeta