Agentic Pelican on a Bicycle

6 months ago

The agentic loop (generate, assess, improve) is applied to iteratively refine an SVG of a pelican riding a bicycle.
Simon Willison's benchmark—'Generate an SVG of a pelican riding a bicycle'—is used to test model creativity and improvement capabilities.
Models are given tools like Chrome DevTools for SVG-to-JPG conversion and their own vision capabilities to self-assess and iterate.
Six multimodal models were tested: Claude Opus 4.1, Claude Sonnet 4.5, Claude Haiku 4.5, GPT-5 Medium, GPT-5-Codex Medium, and Gemini 2.5 Pro.
Results varied: Claude Opus 4.1 added realistic details like a bicycle chain, while GPT-5-Codex made the image more complex but not necessarily better.
Gemini 2.5 Pro showed the most significant changes in composition across iterations.
The experiment reveals that models differ in their ability to self-critique and improve, with some excelling in mechanical reasoning and others struggling with aesthetic judgment.

Hasty Briefsbeta