The Role of Feature Normalization in Ijepa
12 days ago
- #ViT-Small
- #dependency management
- #machine learning
- Uses UV for dependency management.
- Requires downloading datasets and NYU-Depth tar files, needing ~100GB storage.
- Default training configuration involves a ~300m parameter ViT-Small, consuming ~22GB VRAM over 116 hours.
- Supports resuming training runs, evaluating IN1k validation performance, visualizing features, and plotting losses.
- Token_ids in the code are LongTensors with four integers per token: register id, sample id, height id, width id.
- Model processes batches with patches from varied resolution images, differing from standard ViT models.
- Pytorch's eval mode affects the model's forward pass, requiring model.eval() before evaluation.
- LiDAR score is computed from a random subset of training data, potentially changing upon resuming runs.
- Supports single-GPU training only, with optional PILLOW-SIMD for faster dataloading.
- Hidden features include TOME, absolute factorized learnable position embeddings, and various predictor training options.
- Adding register tokens to the encoder and predictor was found to decrease performance significantly.