Knowledge Infusion Scaling Law for Pre-Training Large Language Models

19 hours ago

Copy Link

Large language models (LLMs) show impressive general capabilities but often underperform on specialized knowledge tasks.
Strategic infusion of domain knowledge during pretraining can improve downstream performance, but balancing this infusion is challenging.
Over-infusion of domain knowledge can lead to memory collapse, where the model's knowledge retention sharply degrades.
Two key observations: 1) Each model has a critical collapse point beyond which knowledge retention degrades. 2) These collapse points scale with model size.
A knowledge infusion scaling law is proposed to predict the optimal amount of domain knowledge to inject into large LLMs by analyzing smaller models.
Experiments across different model sizes and pretraining token budgets validate the effectiveness and generalizability of the scaling law.

Hasty Briefsbeta