An Ode to Bzip
2 days ago
- #bzip
- #Lua
- #compression
- ComputerCraft is a mod that adds programming to Minecraft using Lua code.
- Compression is needed due to limited disk space, with bzip being the most efficient for text-like data like Lua code.
- bzip outperforms other compression algorithms (zopfli, zstd, xz, brotli, lzip) in compressing code, achieving better ratios.
- Unlike LZ77-based algorithms, bzip uses BWT (Burrows-Wheeler Transform), which groups characters by context, making it more efficient for repetitive text.
- BWT has downsides, such as mixing different dialects or formats, but works well for consistent data like code.
- bzip2 and bzip3 differ in how they compress BWT output, with bzip2 using RLE and bzip3 being more intelligent.
- BWT-based methods are deterministic and free of heuristics, unlike LZ77-based methods that require tuning.
- Decoder size for bzip is manageable, especially when optimized for self-extracting archives, fitting in ~1.5 KB.
- bzip's performance is slower for compression but decoding is acceptable, especially in high-level languages like Lua.
- Alternatives like custom algorithms or pre-processing code before compression don't significantly improve ratios over bzip.
- bzip is ideal for text and code compression, offering simplicity, efficiency, and fewer heuristics compared to LZ77-based methods.