MegaFold: An Open-Sourced AlphaFold-3 Training System

2 days ago

Copy Link

MegaFold is introduced as an open-sourced training system for AlphaFold-3 (AF3), addressing inefficiencies in current AF3 training pipelines.
AlphaFold-3 (AF3) is highlighted for its ability to predict protein 3D structures with atomic-level fidelity, earning its creators a Nobel Prize in Chemistry.
The blog identifies inefficiencies in AF3 training, noting it is significantly slower and more memory-intensive compared to similarly sized transformer models like BLOOM-560M.
Key issues with AF3 training include complex data pipelines and frequent launches of compute-heavy operators, leading to memory explosions and slow training times.
MegaFold proposes optimizations including fused EvoAttention and Transition layers to reduce memory usage and increase training speed.
The system also introduces ahead-of-time caching for data loading, significantly reducing GPU idle time caused by CPU-bound retrieval steps.
Benchmark results show MegaFold enables training on longer sequence lengths (up to 768 tokens) and reduces per-iteration training time by up to 1.69x on NVIDIA hardware.
MegaFold's performance improvements are demonstrated across different hardware platforms, including NVIDIA H200 and AMD MI250 GPUs, showcasing its performance portability.

Hasty Briefsbeta