Self-Distillation Enables Continual Learning [pdf]

5 hours ago

Self-Distillation Fine-Tuning (SDFT) enables on-policy learning from expert demonstrations for continual learning.
SDFT uses a demonstration-conditioned model as its own teacher to generate on-policy training signals.
This method outperforms supervised fine-tuning (SFT) by achieving higher new-task accuracy and reducing catastrophic forgetting.
In sequential learning, SDFT allows a single model to accumulate multiple skills over time without performance regression.

Hasty Briefsbeta