Document poisoning in RAG systems: How attackers corrupt AI's sources

2 days ago

Document poisoning in RAG systems allows attackers to corrupt AI knowledge bases by injecting fabricated documents.
Attackers can manipulate RAG systems to report false information, such as incorrect financial figures, without exploiting software vulnerabilities.
PoisonedRAG attack requires two conditions: higher cosine similarity of poisoned documents to queries and the ability to influence LLM outputs.
Three types of poisoned documents were used: a CFO-approved correction, a regulatory notice, and board meeting notes to dominate retrieval results.
Defenses tested include ingestion sanitization, access control, prompt hardening, output monitoring, and embedding anomaly detection, with the latter being most effective.
Embedding anomaly detection reduced attack success from 95% to 20% by identifying suspicious document similarities and clustering.
Even with all defenses, a 10% residual attack success rate remains, influenced by temperature settings and collection maturity.
Key recommendations for defense include mapping all write paths, adding embedding anomaly detection at ingestion, and verifying output monitoring criteria.
Knowledge base poisoning is a persistent and invisible threat, emphasizing the need for proactive defense strategies.

Hasty Briefsbeta