Utility-Preserving, Robust, and Almost Irreversible Forgetting in LLMs

13 hours ago

Copy Link

Introduction of JensUn, a new unlearning method for LLMs using Jensen-Shannon Divergence for better stability and effectiveness.
JensUn outperforms existing methods in achieving a better forget-utility trade-off and shows resilience to benign relearning.
Creation of LKF, a dataset of lesser-known facts, to provide a realistic scenario for evaluating unlearning methods.
Proposal of an improved evaluation framework using an LLM as a semantic judge and worst-case evaluation over various paraphrases and input formats.
Findings that many existing unlearning methods are less effective than previously thought under the new evaluation framework.

Hasty Briefsbeta