Letting Claude Play Text Adventures

4 months ago

Attended an AI hackathon focused on mech interp but worked at the API layer due to limited PyTorch knowledge.
Explored cognitive architectures (Soar, ACT-R) and their potential to scaffold LLMs for better performance.
Chose text adventures as an evaluation task due to their structured, long-horizon nature, using Anchorhead as a test case.
Developed a Python wrapper to interact with the dfrotz interpreter for text adventures.
Implemented a simple LLM agent (SimplePlayer) that interacts with the game via chat history but faced high token costs.
Experimented with memory harnesses to reduce token usage but observed degraded performance in task completion.
Created smaller, custom games to test agent performance but found them less effective than complex games like Anchorhead.
Proposed future improvements like domain-specific memories, automatic/manual geography tracking, and episodic memory.

Hasty Briefsbeta