Image Diffusion Models Exhibit Emergent Temporal Propagation in Videos
5 hours ago
- #diffusion-models
- #computer-vision
- #object-tracking
- Image diffusion models capture semantic structures enabling recognition and localization tasks.
- Self-attention maps can be reinterpreted as semantic label propagation kernels for pixel-level correspondences.
- Temporal propagation kernel enables zero-shot object tracking via segmentation in videos.
- Test-time optimization strategies (DDIM inversion, textual inversion, adaptive head weighting) enhance diffusion features for label propagation.
- DRIFT framework leverages pretrained image diffusion models with SAM-guided mask refinement for state-of-the-art zero-shot tracking performance.