FlashPack: Fast Model Loading for PyTorch

7 months ago

FlashPack is a high-throughput file format and loading mechanism for PyTorch designed to speed up model checkpoint I/O.
It makes model loading 3–6× faster than current methods like `accelerate` or `load_state_dict()` and `to()`.
FlashPack treats model weights as a single data stream instead of individual files, improving load times.
Key features include flattening the state_dict into a contiguous byte stream, memory-mapped reads, and overlapping disk, CPU, and GPU operations with CUDA streams.
Benchmarks show 2–6× faster checkpoint loading compared to existing methods.
Limitations include requiring all weights to be the same data type and not supporting device mapping or state dictionary transformations.
FlashPack can be installed via PyPI or GitHub and integrates easily with existing workflows, including Hugging Face models.

Hasty Briefsbeta