Knowledge Infusion Scaling Law for Pre-Training Large Language Models
19 hours ago
- #Scaling Law
- #Knowledge Infusion
- #Large Language Models
- Large language models (LLMs) show impressive general capabilities but often underperform on specialized knowledge tasks.
- Strategic infusion of domain knowledge during pretraining can improve downstream performance, but balancing this infusion is challenging.
- Over-infusion of domain knowledge can lead to memory collapse, where the model's knowledge retention sharply degrades.
- Two key observations: 1) Each model has a critical collapse point beyond which knowledge retention degrades. 2) These collapse points scale with model size.
- A knowledge infusion scaling law is proposed to predict the optimal amount of domain knowledge to inject into large LLMs by analyzing smaller models.
- Experiments across different model sizes and pretraining token budgets validate the effectiveness and generalizability of the scaling law.