Utility-Preserving, Robust, and Almost Irreversible Forgetting in LLMs
13 hours ago
- #Data Privacy
- #Machine Learning
- #Large Language Models
- Introduction of JensUn, a new unlearning method for LLMs using Jensen-Shannon Divergence for better stability and effectiveness.
- JensUn outperforms existing methods in achieving a better forget-utility trade-off and shows resilience to benign relearning.
- Creation of LKF, a dataset of lesser-known facts, to provide a realistic scenario for evaluating unlearning methods.
- Proposal of an improved evaluation framework using an LLM as a semantic judge and worst-case evaluation over various paraphrases and input formats.
- Findings that many existing unlearning methods are less effective than previously thought under the new evaluation framework.