Show HN: Declaw Arena – a CTF-style challenge to break an AI agent in a microVM
7 hours ago
- #Sandbox Challenges
- #AI Security
- #Data Privacy
- A real AI agent protects secrets within an isolated Declaw sandbox.
- The attacker's objective is to bypass security policies to extract a secret.
- Effectiveness varies: 43% success with no policies, 42% with partial policies, and 0% with full Declaw policies.
- Different challenges include chatting past AI agents or escaping shell restrictions.
- Specific scenario: An AI analyst guards a PII database, with the goal of leaking customer SSNs, credit cards, or emails.
- No signup needed; sessions run in isolated sandboxes with 10-minute time limits.