Hardwood: A New Parser for Apache Parquet
2 days ago
- #Java
- #Apache Parquet
- #Data Processing
- Hardwood is a new open-source Apache Parquet parser optimized for minimal dependencies and high performance.
- Supports Java 21+, available on Maven Central, with optional compression libraries (snappy, zstd, etc.).
- Provides two APIs: RowReader for nested schemas and ColumnReader for high-performance columnar access.
- Features multi-threaded decoding, adaptive page prefetching, and cross-file prefetching for speed.
- Performance benchmarks show significant improvements over traditional single-threaded parsers.
- Built with AI assistance (Claude Code) but requires deep understanding and manual optimization.
- Future plans include predicate push-down, parquet-java compatibility, and writing support.