A small number of samples can poison LLMs of any size
5 hours ago
- #AI Security
- #Data Poisoning
- #LLM Vulnerabilities
- A joint study found that as few as 250 malicious documents can create a 'backdoor' vulnerability in large language models (LLMs), regardless of model size or training data volume.
- The study challenges the assumption that attackers need to control a percentage of training data, showing instead that a small, fixed number of poisoned documents can be effective.
- Backdoors in LLMs can be triggered by specific phrases, causing the model to exhibit undesirable behaviors, such as producing gibberish text or exfiltrating sensitive data.
- The research is the largest poisoning investigation to date, revealing that poisoning attacks require a near-constant number of documents across different model sizes.
- Creating 250 malicious documents is significantly easier than creating millions, making poisoning attacks more feasible for potential attackers.
- The study tested a 'denial-of-service' attack where models were trained to produce gibberish text upon encountering a specific trigger phrase, demonstrating clear and measurable success.
- Results showed that model size does not affect poisoning success; larger models trained on more data were just as vulnerable as smaller ones when exposed to the same number of poisoned documents.
- The findings suggest that absolute count, not relative proportion, of poisoned documents is key to attack effectiveness, with 250 documents being sufficient to backdoor models.
- The study raises concerns about the practicality of poisoning attacks and encourages further research into understanding and mitigating these vulnerabilities.
- Despite the risks of publicizing these findings, the benefits of raising awareness and motivating defensive measures were deemed to outweigh potential misuse by attackers.