Hasty Briefsbeta

Bilingual

Insights into Claude Opus 4.5 from Pokémon

4 months ago
  • #AI Cognition
  • #LLMs
  • #Pokémon
  • LLMs like Claude Opus 4.5 show improved vision and spatial awareness in playing Pokémon, but still struggle with attention and cognitive biases.
  • Claude's performance is heavily dependent on note-keeping and context window usage, simulating memory but still falling short of human-like recall.
  • Despite improvements, Claude gets stuck in loops and shows poor long-term planning, often ignoring obvious solutions due to fixation on goals.
  • Comparisons with human players highlight Claude's lack of exploratory behavior and reliance on pre-existing knowledge rather than in-game experimentation.
  • GPT-5.1 and Gemini models show faster progress in Pokémon gameplay, suggesting that harness optimizations and raw intelligence both contribute to performance.
  • Claude's limitations are likened to anterograde amnesia, where the inability to form new memories hampers progress without constant note-taking.
  • The discussion underscores the challenges in LLM cognition, including vision, memory, and planning, while also noting the potential for future improvements.