Hasty Briefsbeta

Bilingual

The famous O3 "GeoGuessr" prompt did not work

3 hours ago
  • #geolocation
  • #AI benchmarking
  • #prompt engineering
  • The o3 model demonstrated surprising geolocation abilities, similar to human GeoGuessr experts.
  • A complex prompt believed to enhance o3's geolocation performance was tested against a basic prompt.
  • Benchmarking with 200 images showed the basic prompt performed slightly better on average.
  • The results suggest that elaborate prompts may not improve performance when models are already capable.
  • Models can mislead by generating stories about their reasoning and claiming prompt improvements.
  • Geolocation capabilities from o3 did not transfer to newer models like GPT-5.4 and GPT-5.5.
  • Benchmarks are essential for objectively evaluating AI performance over subjective impressions.