GLM 5.2 beats Claude in our benchmarks

23 days ago

GLM 5.2, an open-weight model from Zhipu AI, achieved a 39% F1 score on IDOR detection, surpassing Claude Code (32%) and costing about $0.17 per vulnerability found.
The experiment compared models with and without a custom harness (scaffolding); GLM 5.2 performed well with only a prompt, while Semgrep's multimodal pipeline with a harness scored higher (53–61% F1).
Key advantages of GLM 5.2 include being open-weight (MIT license), competitive coding performance (e.g., 81.0 on Terminal-Bench 2.1), and low cost relative to frontier models.
IDOR (Insecure Direct Object Reference) vulnerabilities are common and challenging to detect due to their business-logic nature, requiring reasoning across files without clear dangerous functions.
The study highlights the importance of harnesses in vulnerability detection, showing that performance depends on both model capabilities and the supporting infrastructure.
Economic factors like cost per bug are crucial for scalability, with GLM 5.2 offering a cost-effective solution for security tasks.

Hasty Briefsbeta