Hasty Briefsbeta

Image Diffusion Models Exhibit Emergent Temporal Propagation in Videos

5 hours ago
  • #diffusion-models
  • #computer-vision
  • #object-tracking
  • Image diffusion models capture semantic structures enabling recognition and localization tasks.
  • Self-attention maps can be reinterpreted as semantic label propagation kernels for pixel-level correspondences.
  • Temporal propagation kernel enables zero-shot object tracking via segmentation in videos.
  • Test-time optimization strategies (DDIM inversion, textual inversion, adaptive head weighting) enhance diffusion features for label propagation.
  • DRIFT framework leverages pretrained image diffusion models with SAM-guided mask refinement for state-of-the-art zero-shot tracking performance.