Hasty Briefsbeta

Bilingual

An Ode to Bzip

2 days ago
  • #bzip
  • #Lua
  • #compression
  • ComputerCraft is a mod that adds programming to Minecraft using Lua code.
  • Compression is needed due to limited disk space, with bzip being the most efficient for text-like data like Lua code.
  • bzip outperforms other compression algorithms (zopfli, zstd, xz, brotli, lzip) in compressing code, achieving better ratios.
  • Unlike LZ77-based algorithms, bzip uses BWT (Burrows-Wheeler Transform), which groups characters by context, making it more efficient for repetitive text.
  • BWT has downsides, such as mixing different dialects or formats, but works well for consistent data like code.
  • bzip2 and bzip3 differ in how they compress BWT output, with bzip2 using RLE and bzip3 being more intelligent.
  • BWT-based methods are deterministic and free of heuristics, unlike LZ77-based methods that require tuning.
  • Decoder size for bzip is manageable, especially when optimized for self-extracting archives, fitting in ~1.5 KB.
  • bzip's performance is slower for compression but decoding is acceptable, especially in high-level languages like Lua.
  • Alternatives like custom algorithms or pre-processing code before compression don't significantly improve ratios over bzip.
  • bzip is ideal for text and code compression, offering simplicity, efficiency, and fewer heuristics compared to LZ77-based methods.