PivCo-Huffman "Merge" Operations

9 hours ago

PivCo-Huffman is a method that transforms Huffman encoding into batch processing with parallel list merges.
Traditional Huffman decoding is serial, limiting scalability; PivCo addresses this via parallel list merging.
Parallel alternatives include multiple separate bitstreams (limited by overhead), interleaved streams (requires magic number), and brute-force speculation (wasteful).
PivCo processes the entire input at once, pushing symbols down the Huffman tree and outputting bits in a different order.
Decoding in PivCo uses prefix sums and list merges, which are parallel-friendly and scalable across hardware.
Key optimizations include using fixed-length alphabets for parallel decoding and storing extra bits to avoid scanning.
The merge operation, central to decoding, can be implemented efficiently using SIMD instructions like AVX512_VBMI2's VPEXPANDB.
For SSE/NEON targets, 8-byte or 16-byte merging is done via shuffle instructions (PSHUFB/TBL) with precomputed lookup tables.
On NEON, a 16-byte merge uses vqtbl2q_u8 and vabdq_s8 for optimal performance, leveraging 2-register TBL.
AVX2 implementations largely mirror 128-bit approaches but with minor speed gains due to wider vectors.
PivCo offers a flexible, hardware-agnostic parallel decoding method, avoiding the pitfalls of other parallel Huffman techniques.

Hasty Briefsbeta