Getting an LLM to Play Text Adventures
9 months ago
- #Text-Adventure
- #LLM
- #AI-Gaming
- Research investigates LLMs playing text adventures, with mixed results.
- ChatGPT 3.5 and GPT-4o-mini show limited capability in text adventure games.
- LLMs struggle with state transitions in text adventures, getting it wrong 40% of the time.
- Prompt engineering is used to guide LLMs, but they still make errors like context poisoning.
- LLMs often get stuck in loops or obsess over irrelevant details.
- Examples include failing to place a gold watch on the floor or misusing commands.
- LLMs sometimes ignore hints and revert to previous obsessions.
- Performance varies by model, with Claude 3.5 Haiku showing some promise but still flawed.
- Cost is a significant barrier, with $1 spent to complete an easy text adventure.
- Future work includes benchmarking different LLMs on text adventure performance.