Thinking Machines – LoRA Without Regret

10 hours ago

https://thinkingmachines.ai/blog/lora/

Copy Link

#Parameter-Efficient Fine-Tuning
#LoRA
#Machine Learning

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that modifies weight matrices in large language models by adding a low-rank update, reducing the number of trainable parameters.
LoRA offers advantages over full fine-tuning (FullFT) in multi-tenant serving, training layout size, and ease of loading/transfer due to its smaller memory footprint and faster setup.
LoRA performs comparably to FullFT in supervised fine-tuning on small-to-medium datasets but underperforms when dataset size exceeds LoRA's capacity.
LoRA is less tolerant of large batch sizes compared to FullFT, showing a performance gap that increases with batch size, independent of rank.
Applying LoRA to all layers, especially MLP/MoE layers, yields better performance than attention-only LoRA, even when matching the number of trainable parameters.
In reinforcement learning, LoRA matches FullFT performance even with very low ranks (e.g., rank=1), as RL requires less capacity due to limited information per episode.
Optimal learning rates for LoRA are consistently ~10x higher than for FullFT, and LoRA's compute efficiency is slightly better (~2/3 the FLOPs of FullFT).
Key hyperparameters for LoRA include rank, learning rate, and initialization scales, with invariances reducing the effective parameter space to tune.
LoRA's performance is similar to FullFT when applied to all layers and when not capacity-constrained, making it suitable for most post-training scenarios.
Open questions remain about sharpening performance predictions, theoretical understanding of LoRA dynamics, and evaluating LoRA variants like PiSSA.

Hasty Briefsbeta

Thinking Machines – LoRA Without Regret