Hasty Briefsbeta

Paper2Video: Automatic Video Generation from Scientific Papers

9 hours ago
  • #Computer Vision
  • #Video Generation
  • #Academic Presentation
  • Paper2Video is introduced as the first benchmark dataset of 101 research papers paired with author-created presentation videos, slides, and speaker metadata.
  • The challenges of academic presentation video generation include dense multi-modal information (text, figures, tables) and coordinating multiple aligned channels (slides, subtitles, speech, human talker).
  • Four tailored evaluation metrics are designed: Meta Similarity, PresentArena, PresentQuiz, and IP Memory to measure video effectiveness in conveying paper information.
  • PaperTalker is proposed as a multi-agent framework for academic presentation video generation, integrating slide generation, layout refinement, cursor grounding, subtitling, speech synthesis, and talking-head rendering.
  • Experiments show that PaperTalker produces more faithful and informative presentation videos compared to existing baselines.
  • The dataset, agent, and code for Paper2Video are made available for public use.