Hasty Briefsbeta

Bilingual

Can frontier LLMs solve CAD tasks?

21 hours ago
  • #LLMs
  • #CAD
  • #3D-printing
  • Frontier LLMs like GPT-5.3-Codex, Gemini 3.1 Pro, and Claude Opus 4.6 show varying capabilities, excelling in some tasks while struggling with others.
  • LLMs are primarily trained on text data, lacking the visual/spatial/motor experience humans naturally acquire, making them less adept at tasks like CAD.
  • The experiment tested LLMs on designing a 3D-printable wall mount for a bike pump using OpenSCAD, with simulations in MuJoCo to validate designs.
  • Claude Opus 4.6 performed best with a 100% pass rate, though designs often needed refinement. GPT-5.2 had a good pass rate but produced flawed designs.
  • Gemini 3.1 Pro and 3 Flash showed potential but were inconsistent, sometimes producing great designs and other times failing or looping.
  • Open-weight models like GLM-4.6V, Kimi K2.5, and Qwen 3.5 397B performed poorly, with simplistic or non-functional designs.
  • The project highlighted challenges like convex decomposition in MuJoCo and the complexity of building an agentic harness for LLMs.
  • Future improvements could include better grading rubrics, more objects for testing, and integrating off-the-shelf agent harnesses for better performance.