Hasty Briefsbeta

Decompression up to 30% faster in CPython 3.15

12 days ago
  • #Zstandard
  • #Performance Optimization
  • #CPython
  • CPython's compression.zstd module was found to be slower than third-party Zstandard Python bindings like pyzstd, zstandard, and zstd.
  • Benchmarking revealed that the standard library's decompression was 10-25% slower, particularly in decompression tasks.
  • Initial theories for the performance gap included older system-installed libzstd versions, inefficient decompress() implementation, and slow output buffer management.
  • Profiling identified that over 50% of decompression time was spent in _BlocksOutputBuffer_Finish, indicating a bottleneck in output buffer management.
  • Adopting the new PyBytesWriter API from PEP 782 significantly improved performance, making compression.zstd faster than the zstandard module in benchmarks.
  • The optimization also benefited other compression modules like zlib, showing 10-15% faster decompression for data sizes ≥1 MiB.
  • The improvements simplified the output buffer code, removing 60 lines while enhancing performance.
  • Future work includes further profiling of compression code and exploring optimizations related to output data size information.