Decompression up to 30% faster in CPython 3.15
12 days ago
- #Zstandard
- #Performance Optimization
- #CPython
- CPython's compression.zstd module was found to be slower than third-party Zstandard Python bindings like pyzstd, zstandard, and zstd.
- Benchmarking revealed that the standard library's decompression was 10-25% slower, particularly in decompression tasks.
- Initial theories for the performance gap included older system-installed libzstd versions, inefficient decompress() implementation, and slow output buffer management.
- Profiling identified that over 50% of decompression time was spent in _BlocksOutputBuffer_Finish, indicating a bottleneck in output buffer management.
- Adopting the new PyBytesWriter API from PEP 782 significantly improved performance, making compression.zstd faster than the zstandard module in benchmarks.
- The optimization also benefited other compression modules like zlib, showing 10-15% faster decompression for data sizes ≥1 MiB.
- The improvements simplified the output buffer code, removing 60 lines while enhancing performance.
- Future work includes further profiling of compression code and exploring optimizations related to output data size information.