Hasty Briefsbeta

Bilingual

Hardwood: A New Parser for Apache Parquet

2 days ago
  • #Java
  • #Apache Parquet
  • #Data Processing
  • Hardwood is a new open-source Apache Parquet parser optimized for minimal dependencies and high performance.
  • Supports Java 21+, available on Maven Central, with optional compression libraries (snappy, zstd, etc.).
  • Provides two APIs: RowReader for nested schemas and ColumnReader for high-performance columnar access.
  • Features multi-threaded decoding, adaptive page prefetching, and cross-file prefetching for speed.
  • Performance benchmarks show significant improvements over traditional single-threaded parsers.
  • Built with AI assistance (Claude Code) but requires deep understanding and manual optimization.
  • Future plans include predicate push-down, parquet-java compatibility, and writing support.