Not All Tokens Are Meant to Be Forgotten
a year ago
- #Privacy
- #Machine Learning
- #Large Language Models
- Large Language Models (LLMs) exhibit human-level language understanding but memorize unwanted information like private or copyrighted content.
- Existing unlearning methods face over-forgetting issues, suppressing all tokens in forget samples and losing model utility.
- The Targeted Information Forgetting (TIF) framework differentiates unwanted words (UW) from general words (GW) to improve unlearning.
- TIF uses Targeted Preference Optimization with Logit Preference Loss for unlearning UW and Preservation Loss to retain GW.
- Experiments on TOFU and MUSE benchmarks show TIF enhances unlearning effectiveness while preserving model utility.