New Prompt Injection Papers: Agents Rule of Two and the Attacker Moves Second

7 months ago

Two new papers on LLM security and prompt injection were discussed.
Agents Rule of Two: A Practical Approach to AI Agent Security proposes a 'Rule of Two' inspired by the lethal trifecta and Google Chrome's Rule of 2.
The rule states that agents must satisfy no more than two of three properties to avoid high-impact consequences of prompt injection.
The three properties are: processing untrustworthy inputs, access to sensitive systems/data, and changing state or communicating externally.
The lethal trifecta model is limited to data exfiltration risks, while the Rule of Two includes changing state, covering more risks.
The Attacker Moves Second paper evaluates 12 defenses against prompt injection and jailbreaking using adaptive attacks.
Adaptive attacks, including gradient-based, reinforcement learning, and search-based methods, defeated most defenses with over 90% success rate.
Human red-teaming achieved a 100% success rate against all defenses.
The paper emphasizes the importance of adaptive evaluations for defense development.
The conclusion suggests that reliable defenses against prompt injection are not yet available, supporting the Agents Rule of Two as current best practice.

Hasty Briefsbeta