Hasty Briefsbeta

Bilingual

New Prompt Injection Papers: Agents Rule of Two and the Attacker Moves Second

6 months ago
  • #LLM Security
  • #AI Agents
  • #Prompt Injection
  • Two new papers on LLM security and prompt injection were discussed.
  • Agents Rule of Two: A Practical Approach to AI Agent Security proposes a 'Rule of Two' inspired by the lethal trifecta and Google Chrome's Rule of 2.
  • The rule states that agents must satisfy no more than two of three properties to avoid high-impact consequences of prompt injection.
  • The three properties are: processing untrustworthy inputs, access to sensitive systems/data, and changing state or communicating externally.
  • The lethal trifecta model is limited to data exfiltration risks, while the Rule of Two includes changing state, covering more risks.
  • The Attacker Moves Second paper evaluates 12 defenses against prompt injection and jailbreaking using adaptive attacks.
  • Adaptive attacks, including gradient-based, reinforcement learning, and search-based methods, defeated most defenses with over 90% success rate.
  • Human red-teaming achieved a 100% success rate against all defenses.
  • The paper emphasizes the importance of adaptive evaluations for defense development.
  • The conclusion suggests that reliable defenses against prompt injection are not yet available, supporting the Agents Rule of Two as current best practice.