Have your cake and decompress it too
3 days ago
- #data-storage
- #performance
- #compression
- Vortex uses BtrBlocks-style codec selection to achieve better compression and speed than Parquet+ZSTD.
- Vortex files are 38% smaller and decompress 10–25x faster than Parquet with ZSTD on TPC-H at scale factor 10.
- Parquet uses a two-layer compression approach with lightweight encoding followed by general-purpose compression like ZSTD.
- BtrBlocks and Vortex use recursive cascading of lightweight encodings, allowing multiple fast, random-access-preserving codecs to be chained.
- Vortex employs sampling to efficiently determine the best compression scheme without processing the entire dataset.
- Vortex offers type-specific compressors for integers, floats, strings, and temporal data, each with tailored encoding options.
- Vortex provides two built-in compression strategies: default (lightweight encodings) and compact (adds PCodec and ZSTD for maximum compression).
- Vortex allows per-column compression configuration, enabling users to optimize for speed or size on a column-by-column basis.
- Vortex diverges from BtrBlocks in areas like lazy statistics computation, adaptive sampling, and additional encoding schemes.
- Future developments may include domain-specific encodings, PCodec integration, and cross-column compression techniques.