PivCo-Huffman "Merge" Operations
10 hours ago
- #Huffman Coding
- #SIMD Optimization
- #Parallel Decoding
- PivCo-Huffman is a method that transforms Huffman encoding into batch processing with parallel list merges.
- Traditional Huffman decoding is serial, limiting scalability; PivCo addresses this via parallel list merging.
- Parallel alternatives include multiple separate bitstreams (limited by overhead), interleaved streams (requires magic number), and brute-force speculation (wasteful).
- PivCo processes the entire input at once, pushing symbols down the Huffman tree and outputting bits in a different order.
- Decoding in PivCo uses prefix sums and list merges, which are parallel-friendly and scalable across hardware.
- Key optimizations include using fixed-length alphabets for parallel decoding and storing extra bits to avoid scanning.
- The merge operation, central to decoding, can be implemented efficiently using SIMD instructions like AVX512_VBMI2's VPEXPANDB.
- For SSE/NEON targets, 8-byte or 16-byte merging is done via shuffle instructions (PSHUFB/TBL) with precomputed lookup tables.
- On NEON, a 16-byte merge uses vqtbl2q_u8 and vabdq_s8 for optimal performance, leveraging 2-register TBL.
- AVX2 implementations largely mirror 128-bit approaches but with minor speed gains due to wider vectors.
- PivCo offers a flexible, hardware-agnostic parallel decoding method, avoiding the pitfalls of other parallel Huffman techniques.