Tiny worlds: A minimal implementation of DeepMind's Genie world model

4 days ago

Copy Link

TinyWorlds is a minimal autoregressive world model based on Google Deepmind's Genie Architecture.
It helps understand scalable world models by using an autoregressive, unsupervised method.
Installation involves cloning the repository, installing dependencies, and setting up environment variables.
Training requires downloading datasets and running a training script with a configuration file.
Inference involves pulling pretrained checkpoints and running an inference script.
World models map the current state of the environment to the next state, compressing information into laws.
TinyWorlds uses discrete tokens for easier dynamics prediction and consists of three modules: Video Tokenizer, Action Tokenizer, and Dynamics Model.
Space-Time Transformer (STT) is used for video processing with spatial and temporal attention layers.
Variational Autoencoders (VAEs) are used for quantization and tokenization.
Action Tokenizer infers actions between frames without prior action data.
Dynamics Model predicts future frames based on past tokens and actions.
Data is processed into .h5 files and available datasets include PicoDoom, Pong, Zelda, and more.
Supports Torch features like compile, DDP, AMP, and TF32 for accelerated training and inference.
Future improvements include Mixture of Experts, new optimizers, and scaling to more GPUs.

Hasty Briefsbeta