Tiny worlds: A minimal implementation of DeepMind's Genie world model
4 days ago
- #deep-learning
- #autoregressive
- #world-models
- TinyWorlds is a minimal autoregressive world model based on Google Deepmind's Genie Architecture.
- It helps understand scalable world models by using an autoregressive, unsupervised method.
- Installation involves cloning the repository, installing dependencies, and setting up environment variables.
- Training requires downloading datasets and running a training script with a configuration file.
- Inference involves pulling pretrained checkpoints and running an inference script.
- World models map the current state of the environment to the next state, compressing information into laws.
- TinyWorlds uses discrete tokens for easier dynamics prediction and consists of three modules: Video Tokenizer, Action Tokenizer, and Dynamics Model.
- Space-Time Transformer (STT) is used for video processing with spatial and temporal attention layers.
- Variational Autoencoders (VAEs) are used for quantization and tokenization.
- Action Tokenizer infers actions between frames without prior action data.
- Dynamics Model predicts future frames based on past tokens and actions.
- Data is processed into .h5 files and available datasets include PicoDoom, Pong, Zelda, and more.
- Supports Torch features like compile, DDP, AMP, and TF32 for accelerated training and inference.
- Future improvements include Mixture of Experts, new optimizers, and scaling to more GPUs.