Can frontier LLMs solve CAD tasks?
a day ago
- #LLMs
- #CAD
- #3D-printing
- Frontier LLMs like GPT-5.3-Codex, Gemini 3.1 Pro, and Claude Opus 4.6 show varying capabilities, excelling in some tasks while struggling with others.
- LLMs are primarily trained on text data, lacking the visual/spatial/motor experience humans naturally acquire, making them less adept at tasks like CAD.
- The experiment tested LLMs on designing a 3D-printable wall mount for a bike pump using OpenSCAD, with simulations in MuJoCo to validate designs.
- Claude Opus 4.6 performed best with a 100% pass rate, though designs often needed refinement. GPT-5.2 had a good pass rate but produced flawed designs.
- Gemini 3.1 Pro and 3 Flash showed potential but were inconsistent, sometimes producing great designs and other times failing or looping.
- Open-weight models like GLM-4.6V, Kimi K2.5, and Qwen 3.5 397B performed poorly, with simplistic or non-functional designs.
- The project highlighted challenges like convex decomposition in MuJoCo and the complexity of building an agentic harness for LLMs.
- Future improvements could include better grading rubrics, more objects for testing, and integrating off-the-shelf agent harnesses for better performance.