Hasty Briefsbeta

Bilingual

Not All Tokens Are Meant to Be Forgotten

a year ago
  • #Privacy
  • #Machine Learning
  • #Large Language Models
  • Large Language Models (LLMs) exhibit human-level language understanding but memorize unwanted information like private or copyrighted content.
  • Existing unlearning methods face over-forgetting issues, suppressing all tokens in forget samples and losing model utility.
  • The Targeted Information Forgetting (TIF) framework differentiates unwanted words (UW) from general words (GW) to improve unlearning.
  • TIF uses Targeted Preference Optimization with Logit Preference Loss for unlearning UW and Preservation Loss to retain GW.
  • Experiments on TOFU and MUSE benchmarks show TIF enhances unlearning effectiveness while preserving model utility.