Learning to Model the World with Language
16 days ago
- #AI
- #Reinforcement Learning
- #Language Understanding
- Dynalang is an agent that learns to understand and leverage diverse language to predict future observations, world behavior, and rewards.
- It uses a multimodal world model to predict future text and image representations, improving task performance through imagined model rollouts.
- Dynalang can be pretrained on text or video datasets without actions or rewards, enabling it to benefit from large-scale offline data.
- The agent outperforms state-of-the-art RL algorithms and task-specific architectures in tasks like grid worlds and photorealistic home navigation.
- Dynalang unifies language understanding with future prediction, allowing it to handle environment descriptions, game rules, and instructions effectively.
- It models video and text as a unified sequence, similar to human perception, improving both pretraining and RL performance.
- The agent can also generate language grounded in the environment, showcasing capabilities in embodied question answering.
- Pretraining Dynalang on general text data enhances downstream task performance, demonstrating the versatility of its architecture.