TinyLoRA – Learning to Reason in 13 Parameters
5 days ago
- #Reasoning in Language Models
- #Low-Rank Adaptation
- #Reinforcement Learning
- The paper introduces TinyLoRA, a method enabling low-rank adapters with as few as one parameter.
- TinyLoRA trains an 8B parameter Qwen2.5 model to achieve 91% accuracy on GSM8K using only 13 parameters (26 bytes in bf16).
- Performance improvements reach 90% of baseline with 1000x fewer parameters across benchmarks like AIME, AMC, and MATH500.
- Reinforcement learning (RL) is essential; supervised fine-tuning (SFT) requires 100-1000x larger updates for similar results.