Taking a Look at Compression Algorithms – Moncef Abboud

4 days ago

#Compression Algorithms
#Data Optimization
#Lossless Encoding

The author explores various compression algorithms (GZIP, Snappy, LZ4, ZSTD) after working on a Kafka broker project, diving into how they work rather than just using APIs.
Compression reduces data size for storage and transmission savings, with two main types: lossless (exact reconstruction) and lossy (approximation, e.g., JPEG).
Key compression techniques include Run-Length Encoding (RLE), Lempel-Ziv (LZ) with back-references, and Huffman coding for variable-length encoding based on symbol frequency.
GZIP uses the DEFLATE algorithm, combining LZ77 (sliding window with back-references) and Huffman encoding, with block types like uncompressed, fixed, and dynamic Huffman codes.
DEFLATE encodes literals and back-references with Huffman codes, using extra bits for less common values, and employs a chained hash table for efficient match searching.
Snappy is an LZ-based algorithm focused on speed, compressing/decompressing at high rates (e.g., 250/500 MB/s), with lower compression ratios than DEFLATE and a simple format using literal and copy chunks.
LZ4 is similar to Snappy but faster, with compression ratios around 2.1x, using a token-based block format with literal and match lengths encoded in 4-bit fields and extra bytes for longer lengths.
ZSTD (Zstandard) offers high compression ratios (e.g., ~2.9x) with speeds comparable to LZ4, combining LZ, Huffman, and Finite State Entropy (FSE), plus features like trainable dictionaries for specific data types.
Arithmetic coding improves on Huffman by representing symbols proportionally to their probability, but it's computationally intensive; FSE builds on these ideas for better efficiency in ZSTD.
Compression optimization involves trade-offs between compression ratio, speed, and decompression speed, often using hash tables and search strategies to balance CPU use and efficiency.

Hasty Briefsbeta

Taking a Look at Compression Algorithms – Moncef Abboud