Hasty Briefsbeta

Knowledge Infusion Scaling Law for Pre-Training Large Language Models

19 hours ago
  • #Scaling Law
  • #Knowledge Infusion
  • #Large Language Models
  • Large language models (LLMs) show impressive general capabilities but often underperform on specialized knowledge tasks.
  • Strategic infusion of domain knowledge during pretraining can improve downstream performance, but balancing this infusion is challenging.
  • Over-infusion of domain knowledge can lead to memory collapse, where the model's knowledge retention sharply degrades.
  • Two key observations: 1) Each model has a critical collapse point beyond which knowledge retention degrades. 2) These collapse points scale with model size.
  • A knowledge infusion scaling law is proposed to predict the optimal amount of domain knowledge to inject into large LLMs by analyzing smaller models.
  • Experiments across different model sizes and pretraining token budgets validate the effectiveness and generalizability of the scaling law.