Paper2Video: Automatic Video Generation from Scientific Papers
11 hours ago
- #Computer Vision
- #Video Generation
- #Academic Presentation
- Paper2Video is introduced as the first benchmark dataset of 101 research papers paired with author-created presentation videos, slides, and speaker metadata.
- The challenges of academic presentation video generation include dense multi-modal information (text, figures, tables) and coordinating multiple aligned channels (slides, subtitles, speech, human talker).
- Four tailored evaluation metrics are designed: Meta Similarity, PresentArena, PresentQuiz, and IP Memory to measure video effectiveness in conveying paper information.
- PaperTalker is proposed as a multi-agent framework for academic presentation video generation, integrating slide generation, layout refinement, cursor grounding, subtitling, speech synthesis, and talking-head rendering.
- Experiments show that PaperTalker produces more faithful and informative presentation videos compared to existing baselines.
- The dataset, agent, and code for Paper2Video are made available for public use.