Show HN: OnPair – String compression with fast random access (Rust, C++)
5 days ago
- #rust
- #random-access
- #compression
- OnPair is a compression algorithm for efficient random access on sequences of short strings.
- It has two phases: Training (identifies frequent adjacent token pairs) and Parsing (compresses strings into token IDs).
- OnPair16 is a variant with a 16-byte limit on dictionary entries for optimizations.
- The Rust implementation supports up to 65,536 tokens, each with a 2-byte ID.
- Usage involves adding the crate to Cargo.toml and using OnPair or OnPair16 for compression/decompression.
- Example provided for compressing and decompressing a list of strings.
- Project is licensed under MIT, developed by Francesco Gargiulo and Rossano Venturini at the University of Pisa.