MiniMax M3 vs. GLM 5.2: Codegen comparison across autonomous coding tasks

10 hours ago

Thinkbench was used to evaluate GLM 5.2 and MiniMax M3 in autonomous coding tasks, including greenfield builds, bug fixes, feature additions, and repair-to-green tasks.
GLM 5.2 achieved a 92% full-pass rate and a 0.976 mean score, while MiniMax M3 achieved 84% full-pass and a 0.961 mean score.
MiniMax M3 was cheaper ($6.67 vs. $18.47) and faster (45s vs. 80s avg latency) compared to GLM 5.2.
Performance differences were concentrated in greenfield builds, with GLM being steadier in packaging and complete delivery, while MiniMax sometimes excelled in individual hard builds.
In tasks with ambiguous instructions, MiniMax tended to add more production-shaped machinery, while GLM stayed closer to the plain reading of the brief.
For existing-code work, both models performed similarly, with mean scores ranging from 0.999 to 1.000.
GLM is recommended for hard from-scratch builds requiring complete, runnable projects, while MiniMax is a value pick for bug fixes, feature additions, or repair-to-green tasks under review.
Neither model is recommended as a top-level coordinator; a frontier coordinator like GPT-5.5 or Claude Opus is suggested for delegating and checking work.

Hasty Briefsbeta