Hasty Briefsbeta

Engineering a fixed-width bit-packed integer vector in Rust

20 hours ago
  • #Rust
  • #Memory Optimization
  • #Data Structures
  • Memory usage is a bottleneck when working with large datasets, especially when storing integers with smaller dynamic ranges than their type's capacity.
  • Bit packing is introduced as a solution to reduce memory waste by storing integers back-to-back in a contiguous bitvector, allowing for more efficient storage.
  • The article explores implementing a vector-like data structure in Rust called FixedVec, which compresses integers into a bit-packed format while maintaining O(1) random access performance.
  • FixedVec mimics Rust's standard Vec<T> ergonomics, supporting mutable access and zero-copy slicing.
  • The structure uses a backing buffer of u64 words and precomputes a mask for efficient bit extraction.
  • Accessing elements involves calculating the bit position, word index, and bit offset, with optimizations for both in-word and cross-word boundary cases.
  • Unaligned memory access is leveraged to improve read performance by reducing the number of memory operations required for cross-word elements.
  • Benchmarks show that FixedVec outperforms standard Vec<T> for smaller bit widths due to improved cache locality.
  • Iteration over FixedVec is optimized by maintaining a stateful iterator that amortizes memory access costs over multiple next() calls.
  • Mutable access is provided through a proxy object pattern, enabling safe and ergonomic modifications to bit-packed data.
  • Zero-copy views and slices are supported, allowing for efficient sub-region operations without data duplication.
  • The article discusses future directions, including concurrent access and variable-length encoding for skewed data distributions.