Hasty Briefsbeta

Benchmarking leading AI agents against Google reCAPTCHA v2

12 days ago
  • #AI Performance
  • #Machine Learning
  • #CAPTCHA Testing
  • Claude Sonnet 4.5 outperformed Gemini 2.5 Pro and GPT-5 in solving Google reCAPTCHA v2 challenges with a 60% success rate.
  • GPT-5's performance was significantly worse (28% success rate) due to excessive reasoning and poor planning, leading to timeouts.
  • All models performed best on Static CAPTCHAs and worst on Cross-tile challenges, highlighting perceptual weaknesses in AI.
  • Reload challenges were difficult due to the reasoning-action loop, causing agents to misinterpret refreshes as errors.
  • Cross-tile challenges exposed AI's inability to handle partial, occluded, and boundary-spanning objects effectively.
  • The study suggests that more reasoning isn't always better; quick, confident decisions are crucial for real-time tasks.
  • The evaluation was conducted using Browser Use, an open-source framework for browser-based AI tasks.
  • Agents often exceeded the instructed limit of five CAPTCHA attempts due to unclear challenge boundaries and lack of state tracking.