Hasty Briefsbeta

Bilingual

Getting an LLM to Play Text Adventures

9 months ago
  • #Text-Adventure
  • #LLM
  • #AI-Gaming
  • Research investigates LLMs playing text adventures, with mixed results.
  • ChatGPT 3.5 and GPT-4o-mini show limited capability in text adventure games.
  • LLMs struggle with state transitions in text adventures, getting it wrong 40% of the time.
  • Prompt engineering is used to guide LLMs, but they still make errors like context poisoning.
  • LLMs often get stuck in loops or obsess over irrelevant details.
  • Examples include failing to place a gold watch on the floor or misusing commands.
  • LLMs sometimes ignore hints and revert to previous obsessions.
  • Performance varies by model, with Claude 3.5 Haiku showing some promise but still flawed.
  • Cost is a significant barrier, with $1 spent to complete an easy text adventure.
  • Future work includes benchmarking different LLMs on text adventure performance.