Hasty Briefsbeta

A small number of samples can poison LLMs of any size

5 hours ago
  • #AI Security
  • #Data Poisoning
  • #LLM Vulnerabilities
  • A joint study found that as few as 250 malicious documents can create a 'backdoor' vulnerability in large language models (LLMs), regardless of model size or training data volume.
  • The study challenges the assumption that attackers need to control a percentage of training data, showing instead that a small, fixed number of poisoned documents can be effective.
  • Backdoors in LLMs can be triggered by specific phrases, causing the model to exhibit undesirable behaviors, such as producing gibberish text or exfiltrating sensitive data.
  • The research is the largest poisoning investigation to date, revealing that poisoning attacks require a near-constant number of documents across different model sizes.
  • Creating 250 malicious documents is significantly easier than creating millions, making poisoning attacks more feasible for potential attackers.
  • The study tested a 'denial-of-service' attack where models were trained to produce gibberish text upon encountering a specific trigger phrase, demonstrating clear and measurable success.
  • Results showed that model size does not affect poisoning success; larger models trained on more data were just as vulnerable as smaller ones when exposed to the same number of poisoned documents.
  • The findings suggest that absolute count, not relative proportion, of poisoned documents is key to attack effectiveness, with 250 documents being sufficient to backdoor models.
  • The study raises concerns about the practicality of poisoning attacks and encourages further research into understanding and mitigating these vulnerabilities.
  • Despite the risks of publicizing these findings, the benefits of raising awareness and motivating defensive measures were deemed to outweigh potential misuse by attackers.