Paper2Video: Automatic Video Generation from Scientific Papers

11 hours ago

Copy Link

Paper2Video is introduced as the first benchmark dataset of 101 research papers paired with author-created presentation videos, slides, and speaker metadata.
The challenges of academic presentation video generation include dense multi-modal information (text, figures, tables) and coordinating multiple aligned channels (slides, subtitles, speech, human talker).
Four tailored evaluation metrics are designed: Meta Similarity, PresentArena, PresentQuiz, and IP Memory to measure video effectiveness in conveying paper information.
PaperTalker is proposed as a multi-agent framework for academic presentation video generation, integrating slide generation, layout refinement, cursor grounding, subtitling, speech synthesis, and talking-head rendering.
Experiments show that PaperTalker produces more faithful and informative presentation videos compared to existing baselines.
The dataset, agent, and code for Paper2Video are made available for public use.

Hasty Briefsbeta