Hasty Briefsbeta

Bilingual

Semgrep: GLM 5.2 beats Claude in our Cyber Benchmarks

6 hours ago
  • #AI Security
  • #Vulnerability Detection
  • #Open-Weight Models
  • GLM 5.2, an open-weight model, achieved 39% F1 on IDOR detection, surpassing Claude Code (32%) at a lower cost of about $0.17 per vulnerability.
  • Semgrep's multimodal pipeline with a custom harness led with 53–61% F1, indicating the importance of the harness in performance.
  • The experiment compared models with and without scaffolding, showing that GLM 5.2 performed well with only a prompt, highlighting model capability.
  • GLM 5.2 is open-weight (MIT licensed), competitive in coding benchmarks, and offers a 1M token context, making it suitable for security tasks.
  • IDOR vulnerabilities involve missing access checks and are challenging for both static analysis and LLMs due to their business-logic nature.
  • Key metrics used were precision, recall, F1 score, and cost per true positive to evaluate detection effectiveness and economic viability.