Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio

2 days ago

Autoregressive language models (LMs) trained on raw waveforms can be repurposed for lossless audio compression.
Prior work was limited to 8-bit audio, leaving gaps in understanding for practical settings (16/24-bit).
The study benchmarks LM-based compression on full-fidelity audio across diverse domains, sampling rates, and bit depths.
Standard sample-level tokenization becomes intractable at higher bit depths due to vocabulary size.
Proposed Trilobyte, a byte-level tokenization schema, improves vocabulary scaling from O(2^b) to O(1), enabling tractable 24-bit LM-based lossless compression.
LMs consistently outperform FLAC and achieve state-of-the-art compression at 8-bit and 16-bit.
Compression gains become more modest as bit depth increases beyond 8-bit.

Hasty Briefsbeta