Insights into Claude Opus 4.5 from Pokémon

4 months ago

LLMs like Claude Opus 4.5 show improved vision and spatial awareness in playing Pokémon, but still struggle with attention and cognitive biases.
Claude's performance is heavily dependent on note-keeping and context window usage, simulating memory but still falling short of human-like recall.
Despite improvements, Claude gets stuck in loops and shows poor long-term planning, often ignoring obvious solutions due to fixation on goals.
Comparisons with human players highlight Claude's lack of exploratory behavior and reliance on pre-existing knowledge rather than in-game experimentation.
GPT-5.1 and Gemini models show faster progress in Pokémon gameplay, suggesting that harness optimizations and raw intelligence both contribute to performance.
Claude's limitations are likened to anterograde amnesia, where the inability to form new memories hampers progress without constant note-taking.
The discussion underscores the challenges in LLM cognition, including vision, memory, and planning, while also noting the potential for future improvements.

Hasty Briefsbeta