Square Minus Square – A coding agent benchmark
7 days ago
- #feedback loop
- #coding benchmark
- #LLM testing
- A coding benchmark was conducted to implement a function that calculates the area of one square minus the intersection with another square on a 2D plane, using the least amount of triangles.
- Several coding agents, including top models like Opus, Gemini 3 Pro, and GPT 5.2, were tested, but none successfully solved the task without issues.
- Agents were encouraged to generate screenshots and examine them to fix bugs, demonstrating the importance of a feedback loop in coding tasks.
- Results varied between test runs, with no consistent winner among the top models, and some generated code that crashed or was inefficient.
- The full code and results are available on GitHub, including video captures of the outcomes.