7 days ago
- GLM 5.2, an open-weight model from Zhipu AI, achieved a 39% F1 score on IDOR detection, surpassing Claude Code (32%) and costing about $0.17 per vulnerability found.
- The experiment compared models with and without a custom harness (scaffolding); GLM 5.2 performed well with only a prompt, while Semgrep's multimodal pipeline with a harness scored higher (53–61% F1).
- Key advantages of GLM 5.2 include being open-weight (MIT license), competitive coding performance (e.g., 81.0 on Terminal-Bench 2.1), and low cost relative to frontier models.
- IDOR (Insecure Direct Object Reference) vulnerabilities are common and challenging to detect due to their business-logic nature, requiring reasoning across files without clear dangerous functions.
- The study highlights the importance of harnesses in vulnerability detection, showing that performance depends on both model capabilities and the supporting infrastructure.
- Economic factors like cost per bug are crucial for scalability, with GLM 5.2 offering a cost-effective solution for security tasks.