Parametric CAD Bench
5 hours ago
- #AI Agents
- #Benchmark
- #CAD
- Parametric CAD Bench is a benchmark evaluating AI agents' ability to author editable FreeCAD models from natural language using a multi-step agentic loop.
- It uses a harmonic mean scoring system combining geometry similarity and CAD spec consistency to enforce editability, where both scores must be nonzero.
- Key differences from other benchmarks include a focus on parametric editability, agentic loop evaluation, and scoring native FCStd files with feature trees.
- Leaderboard results show GPT-5.5 via Codex leading with a score of 0.832, followed by other models like Gemini Pro and Claude Opus in various agent frameworks.
- Harness effect is significant: swapping agent frameworks can shift scores by up to 10%, with mini-swe-agent often performing better due to verification rhythm.
- Costs per trial range from $3 to $170 across 100 trials, revealing trade-offs between cost and quality.
- The benchmark is open for third-party submissions through a structured process involving Harbor tasks and Hugging Face datasets.