Hasty Briefsbeta

Bilingual

Parametric CAD Bench

5 hours ago
  • #AI Agents
  • #Benchmark
  • #CAD
  • Parametric CAD Bench is a benchmark evaluating AI agents' ability to author editable FreeCAD models from natural language using a multi-step agentic loop.
  • It uses a harmonic mean scoring system combining geometry similarity and CAD spec consistency to enforce editability, where both scores must be nonzero.
  • Key differences from other benchmarks include a focus on parametric editability, agentic loop evaluation, and scoring native FCStd files with feature trees.
  • Leaderboard results show GPT-5.5 via Codex leading with a score of 0.832, followed by other models like Gemini Pro and Claude Opus in various agent frameworks.
  • Harness effect is significant: swapping agent frameworks can shift scores by up to 10%, with mini-swe-agent often performing better due to verification rhythm.
  • Costs per trial range from $3 to $170 across 100 trials, revealing trade-offs between cost and quality.
  • The benchmark is open for third-party submissions through a structured process involving Harbor tasks and Hugging Face datasets.