Hasty Briefsbeta

The Role of Feature Normalization in Ijepa

12 days ago
  • #ViT-Small
  • #dependency management
  • #machine learning
  • Uses UV for dependency management.
  • Requires downloading datasets and NYU-Depth tar files, needing ~100GB storage.
  • Default training configuration involves a ~300m parameter ViT-Small, consuming ~22GB VRAM over 116 hours.
  • Supports resuming training runs, evaluating IN1k validation performance, visualizing features, and plotting losses.
  • Token_ids in the code are LongTensors with four integers per token: register id, sample id, height id, width id.
  • Model processes batches with patches from varied resolution images, differing from standard ViT models.
  • Pytorch's eval mode affects the model's forward pass, requiring model.eval() before evaluation.
  • LiDAR score is computed from a random subset of training data, potentially changing upon resuming runs.
  • Supports single-GPU training only, with optional PILLOW-SIMD for faster dataloading.
  • Hidden features include TOME, absolute factorized learnable position embeddings, and various predictor training options.
  • Adding register tokens to the encoder and predictor was found to decrease performance significantly.