Hasty Briefsbeta

Bilingual

Have your cake and decompress it too

3 days ago
  • #data-storage
  • #performance
  • #compression
  • Vortex uses BtrBlocks-style codec selection to achieve better compression and speed than Parquet+ZSTD.
  • Vortex files are 38% smaller and decompress 10–25x faster than Parquet with ZSTD on TPC-H at scale factor 10.
  • Parquet uses a two-layer compression approach with lightweight encoding followed by general-purpose compression like ZSTD.
  • BtrBlocks and Vortex use recursive cascading of lightweight encodings, allowing multiple fast, random-access-preserving codecs to be chained.
  • Vortex employs sampling to efficiently determine the best compression scheme without processing the entire dataset.
  • Vortex offers type-specific compressors for integers, floats, strings, and temporal data, each with tailored encoding options.
  • Vortex provides two built-in compression strategies: default (lightweight encodings) and compact (adds PCodec and ZSTD for maximum compression).
  • Vortex allows per-column compression configuration, enabling users to optimize for speed or size on a column-by-column basis.
  • Vortex diverges from BtrBlocks in areas like lazy statistics computation, adaptive sampling, and additional encoding schemes.
  • Future developments may include domain-specific encodings, PCodec integration, and cross-column compression techniques.