Hasty Briefsbeta

Finding vulnerabilities in Python web apps using Claude Code and OpenAI Codex

8 days ago
  • #Python Web Applications
  • #AI Security
  • #Vulnerability Detection
  • AI coding agents (Claude Code and OpenAI Codex) were tested for finding vulnerabilities in 11 large Python web applications.
  • Claude Code found 46 vulnerabilities (14% true positive rate, 86% false positive rate).
  • OpenAI Codex found 21 vulnerabilities (18% true positive rate, 82% false positive rate).
  • Claude Code performed best at finding IDOR bugs (22% true positive rate) but struggled with SQL Injection (5% true positive rate) and XSS (16% true positive rate).
  • OpenAI Codex performed poorly on IDOR (0% true positive rate), SQL Injection (0% true positive rate), and XSS (0% true positive rate) but did better on Path Traversal (47% true positive rate).
  • Non-determinism was observed: identical runs on the same codebase yielded different results.
  • The study highlighted the high false positive rates and the challenges of using AI for vulnerability detection in real-world applications.
  • The research emphasized the need for better benchmarks and scaffolding to improve AI-based vulnerability detection.