Engineering a fixed-width bit-packed integer vector in Rust
a day ago
- #Rust
- #Memory Optimization
- #Data Structures
- Memory usage is a bottleneck when working with large datasets, especially when storing integers with smaller dynamic ranges than their type's capacity.
- Bit packing is introduced as a solution to reduce memory waste by storing integers back-to-back in a contiguous bitvector, allowing for more efficient storage.
- The article explores implementing a vector-like data structure in Rust called FixedVec, which compresses integers into a bit-packed format while maintaining O(1) random access performance.
- FixedVec mimics Rust's standard Vec<T> ergonomics, supporting mutable access and zero-copy slicing.
- The structure uses a backing buffer of u64 words and precomputes a mask for efficient bit extraction.
- Accessing elements involves calculating the bit position, word index, and bit offset, with optimizations for both in-word and cross-word boundary cases.
- Unaligned memory access is leveraged to improve read performance by reducing the number of memory operations required for cross-word elements.
- Benchmarks show that FixedVec outperforms standard Vec<T> for smaller bit widths due to improved cache locality.
- Iteration over FixedVec is optimized by maintaining a stateful iterator that amortizes memory access costs over multiple next() calls.
- Mutable access is provided through a proxy object pattern, enabling safe and ergonomic modifications to bit-packed data.
- Zero-copy views and slices are supported, allowing for efficient sub-region operations without data duplication.
- The article discusses future directions, including concurrent access and variable-length encoding for skewed data distributions.