LLMs Get Lost in Multi-Turn Conversation
a year ago
- #Multi-turn Conversations
- #LLMs
- #Conversational AI
- Large Language Models (LLMs) are conversational interfaces that assist users in defining, exploring, and refining tasks through multi-turn conversations.
- LLM evaluation has predominantly focused on single-turn, fully-specified instructions, despite frequent underspecification in user instructions.
- Experiments show that LLMs perform significantly worse in multi-turn conversations than single-turn, with an average performance drop of 39% across six tasks.
- Performance degradation in multi-turn conversations is due to a minor loss in aptitude and a significant increase in unreliability.
- LLMs often make assumptions early in conversations and prematurely generate final solutions, leading to errors they do not recover from.