Hasty Briefsbeta

Vortex – An extensible, state of the art columnar file format

10 days ago
  • #data-processing
  • #open-source
  • #columnar-format
  • Vortex is a next-generation columnar file format designed for high-performance data processing.
  • It offers 100x faster random access reads, 10-20x faster scans, and 5x faster writes compared to Apache Parquet.
  • Features include extensible architecture with pluggable encoding, type system, compression, and layout strategies.
  • Vortex is open-source under Apache-2.0 license and governed by the Linux Foundation (LF AI & Data).
  • Integrations include Arrow, DataFusion, DuckDB, Spark, Pandas, Polars, and upcoming Apache Iceberg support.
  • The file format is stable from version 0.36.0, ensuring backward compatibility.
  • Logical and physical layers are strictly separated, with built-in and extension encodings.
  • Includes features like zero-copy Arrow integration, extensible encodings, cascading compression, and rich statistics.
  • Installation options include Cargo for Rust and UV for Python, with CLI tool 'vx' for file browsing.
  • Optimal performance suggested with MiMalloc allocator.
  • Security vulnerabilities can be reported to [email protected].
  • Vortex acknowledges contributions from academic and open-source communities, including BtrBlocks, FastLanes, FSST, and Apache projects.