Hasty Briefsbeta

Evaluating GPT5's reasoning ability using the Only Connect game show

12 days ago
  • #Only Connect
  • #Reasoning Benchmark
  • #GPT-5
  • Evaluating GPT-5's reasoning abilities beyond knowledge-based benchmarks, focusing on pattern recognition, lateral thinking, and contextual reasoning.
  • Assessing decision-making in models, especially when choosing between educated guesses or retrieving additional information.
  • Comparing GPT-5's performance with previous models using reasoning effort and verbosity parameters.
  • Only Connect game used as a benchmark for testing LLMs' reasoning capabilities due to its focus on lateral thinking and pattern recognition.
  • Methodology involved sourcing questions from Only Connect, using structured output parameters, and simulating episodes for evaluation.
  • Results showed GPT-5 and reasoning-optimized models performed best, with higher reasoning parameters leading to better accuracy.
  • Missing Vowels round was easiest for models, while The Wall round was most challenging due to prompt complexity.
  • Future steps include publishing the dataset, granular analysis of challenging questions, and implementing competitive model pairings.