Semgrep: GLM 5.2 beats Claude in our Cyber Benchmarks
6 hours ago
- #AI Security
- #Vulnerability Detection
- #Open-Weight Models
- GLM 5.2, an open-weight model, achieved 39% F1 on IDOR detection, surpassing Claude Code (32%) at a lower cost of about $0.17 per vulnerability.
- Semgrep's multimodal pipeline with a custom harness led with 53–61% F1, indicating the importance of the harness in performance.
- The experiment compared models with and without scaffolding, showing that GLM 5.2 performed well with only a prompt, highlighting model capability.
- GLM 5.2 is open-weight (MIT licensed), competitive in coding benchmarks, and offers a 1M token context, making it suitable for security tasks.
- IDOR vulnerabilities involve missing access checks and are challenging for both static analysis and LLMs due to their business-logic nature.
- Key metrics used were precision, recall, F1 score, and cost per true positive to evaluate detection effectiveness and economic viability.