Show HN: Declaw Arena – a CTF-style challenge to break an AI agent in a microVM

7 hours ago

A real AI agent protects secrets within an isolated Declaw sandbox.
The attacker's objective is to bypass security policies to extract a secret.
Effectiveness varies: 43% success with no policies, 42% with partial policies, and 0% with full Declaw policies.
Different challenges include chatting past AI agents or escaping shell restrictions.
Specific scenario: An AI analyst guards a PII database, with the goal of leaking customer SSNs, credit cards, or emails.
No signup needed; sessions run in isolated sandboxes with 10-minute time limits.

Hasty Briefsbeta