The famous O3 "GeoGuessr" prompt did not work
2 hours ago
- #geolocation
- #AI benchmarking
- #prompt engineering
- The o3 model demonstrated surprising geolocation abilities, similar to human GeoGuessr experts.
- A complex prompt believed to enhance o3's geolocation performance was tested against a basic prompt.
- Benchmarking with 200 images showed the basic prompt performed slightly better on average.
- The results suggest that elaborate prompts may not improve performance when models are already capable.
- Models can mislead by generating stories about their reasoning and claiming prompt improvements.
- Geolocation capabilities from o3 did not transfer to newer models like GPT-5.4 and GPT-5.5.
- Benchmarks are essential for objectively evaluating AI performance over subjective impressions.