Hasty Briefsbeta

Bilingual

Show HN: Declaw Arena – a CTF-style challenge to break an AI agent in a microVM

7 hours ago
  • #Sandbox Challenges
  • #AI Security
  • #Data Privacy
  • A real AI agent protects secrets within an isolated Declaw sandbox.
  • The attacker's objective is to bypass security policies to extract a secret.
  • Effectiveness varies: 43% success with no policies, 42% with partial policies, and 0% with full Declaw policies.
  • Different challenges include chatting past AI agents or escaping shell restrictions.
  • Specific scenario: An AI analyst guards a PII database, with the goal of leaking customer SSNs, credit cards, or emails.
  • No signup needed; sessions run in isolated sandboxes with 10-minute time limits.