FlashPack: Fast Model Loading for PyTorch
6 months ago
- #Performance Optimization
- #PyTorch
- #Machine Learning
- FlashPack is a high-throughput file format and loading mechanism for PyTorch designed to speed up model checkpoint I/O.
- It makes model loading 3–6× faster than current methods like `accelerate` or `load_state_dict()` and `to()`.
- FlashPack treats model weights as a single data stream instead of individual files, improving load times.
- Key features include flattening the state_dict into a contiguous byte stream, memory-mapped reads, and overlapping disk, CPU, and GPU operations with CUDA streams.
- Benchmarks show 2–6× faster checkpoint loading compared to existing methods.
- Limitations include requiring all weights to be the same data type and not supporting device mapping or state dictionary transformations.
- FlashPack can be installed via PyPI or GitHub and integrates easily with existing workflows, including Hugging Face models.