Hasty Briefsbeta

Show HN: OnPair – String compression with fast random access (Rust, C++)

5 days ago
  • #rust
  • #random-access
  • #compression
  • OnPair is a compression algorithm for efficient random access on sequences of short strings.
  • It has two phases: Training (identifies frequent adjacent token pairs) and Parsing (compresses strings into token IDs).
  • OnPair16 is a variant with a 16-byte limit on dictionary entries for optimizations.
  • The Rust implementation supports up to 65,536 tokens, each with a 2-byte ID.
  • Usage involves adding the crate to Cargo.toml and using OnPair or OnPair16 for compression/decompression.
  • Example provided for compressing and decompressing a list of strings.
  • Project is licensed under MIT, developed by Francesco Gargiulo and Rossano Venturini at the University of Pisa.