Parametric CAD Bench

5 hours ago

Parametric CAD Bench is a benchmark evaluating AI agents' ability to author editable FreeCAD models from natural language using a multi-step agentic loop.
It uses a harmonic mean scoring system combining geometry similarity and CAD spec consistency to enforce editability, where both scores must be nonzero.
Key differences from other benchmarks include a focus on parametric editability, agentic loop evaluation, and scoring native FCStd files with feature trees.
Leaderboard results show GPT-5.5 via Codex leading with a score of 0.832, followed by other models like Gemini Pro and Claude Opus in various agent frameworks.
Harness effect is significant: swapping agent frameworks can shift scores by up to 10%, with mini-swe-agent often performing better due to verification rhythm.
Costs per trial range from $3 to $170 across 100 trials, revealing trade-offs between cost and quality.
The benchmark is open for third-party submissions through a structured process involving Harbor tasks and Hugging Face datasets.

Hasty Briefsbeta