Hasty Briefsbeta

Bilingual

What happened after 2k people tried to hack my AI assistant

6 hours ago
  • #AI Security
  • #Email Challenge
  • #Prompt Injection
  • Built hackmyclaw.com, an email-based challenge where users tried to trick AI assistant Fiu into leaking a secrets.env file.
  • Fiu received over 6,000 emails from 2,000+ people, including sophisticated attacks like impersonation and multi-language social engineering, but the secret never leaked.
  • The project was powered by a simple security prompt on a VPS, with rules against revealing secrets, file modifications, or executing commands from emails.
  • Experiment faced challenges: Gmail suspension due to high email volume, over $500 in API costs, and batch processing affecting agent's suspicion levels.
  • Key findings: Prompt injection was harder than expected; robust instruction-following with a powerful model like Opus 4.6 provided strong resistance.
  • Despite the success, concerns remain about AI assistants' security due to their access to sensitive data and the potential risks of repeated interactions.
  • Unexpected outcome: Sponsors reached out to support the project, covering increased prizes and API costs.
  • Future considerations: Testing with infinite credits for ongoing conversations and weaker models to explore vulnerabilities, especially in non-English languages.
  • Overall, the experiment increased optimism about AI security but highlighted that prompt injection remains a real threat.