Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio
2 days ago
- #language models
- #lossless compression
- #audio processing
- Autoregressive language models (LMs) trained on raw waveforms can be repurposed for lossless audio compression.
- Prior work was limited to 8-bit audio, leaving gaps in understanding for practical settings (16/24-bit).
- The study benchmarks LM-based compression on full-fidelity audio across diverse domains, sampling rates, and bit depths.
- Standard sample-level tokenization becomes intractable at higher bit depths due to vocabulary size.
- Proposed Trilobyte, a byte-level tokenization schema, improves vocabulary scaling from O(2^b) to O(1), enabling tractable 24-bit LM-based lossless compression.
- LMs consistently outperform FLAC and achieve state-of-the-art compression at 8-bit and 16-bit.
- Compression gains become more modest as bit depth increases beyond 8-bit.