Insights into Claude Opus 4.5 from Pokémon
4 months ago
- #AI Cognition
- #LLMs
- #Pokémon
- LLMs like Claude Opus 4.5 show improved vision and spatial awareness in playing Pokémon, but still struggle with attention and cognitive biases.
- Claude's performance is heavily dependent on note-keeping and context window usage, simulating memory but still falling short of human-like recall.
- Despite improvements, Claude gets stuck in loops and shows poor long-term planning, often ignoring obvious solutions due to fixation on goals.
- Comparisons with human players highlight Claude's lack of exploratory behavior and reliance on pre-existing knowledge rather than in-game experimentation.
- GPT-5.1 and Gemini models show faster progress in Pokémon gameplay, suggesting that harness optimizations and raw intelligence both contribute to performance.
- Claude's limitations are likened to anterograde amnesia, where the inability to form new memories hampers progress without constant note-taking.
- The discussion underscores the challenges in LLM cognition, including vision, memory, and planning, while also noting the potential for future improvements.