Document poisoning in RAG systems: How attackers corrupt AI's sources
2 days ago
- #AI Security
- #Document Poisoning
- #RAG Systems
- Document poisoning in RAG systems allows attackers to corrupt AI knowledge bases by injecting fabricated documents.
- Attackers can manipulate RAG systems to report false information, such as incorrect financial figures, without exploiting software vulnerabilities.
- PoisonedRAG attack requires two conditions: higher cosine similarity of poisoned documents to queries and the ability to influence LLM outputs.
- Three types of poisoned documents were used: a CFO-approved correction, a regulatory notice, and board meeting notes to dominate retrieval results.
- Defenses tested include ingestion sanitization, access control, prompt hardening, output monitoring, and embedding anomaly detection, with the latter being most effective.
- Embedding anomaly detection reduced attack success from 95% to 20% by identifying suspicious document similarities and clustering.
- Even with all defenses, a 10% residual attack success rate remains, influenced by temperature settings and collection maturity.
- Key recommendations for defense include mapping all write paths, adding embedding anomaly detection at ingestion, and verifying output monitoring criteria.
- Knowledge base poisoning is a persistent and invisible threat, emphasizing the need for proactive defense strategies.