Hasty Briefsbeta

Bilingual

An overengineered solution to `sort | uniq -c` with 25x throughput (hist)

6 months ago
  • #performance
  • #unique-lines
  • #CLI
  • A CLI tool named 'hist-rs' for counting unique lines with high throughput.
  • Installation command: 'cargo install hist-rs'.
  • Basic usage: 'hist <file>' to count unique lines in a file.
  • Can read from stdin: '/bin/cat <file> | hist'.
  • Options include: '-u' for unique lines, '-e' to exclude patterns, '-i' to include patterns.
  • Threshold options: '-m' for minimum abundance, '-M' for maximum abundance.
  • Sorting options: '-n' to sort by key, '-d' for descending order.
  • Performance comparison with other tools shows 'hist' is the fastest.
  • Tools compared: hist, cuniq, huniq, sortuniq, naive.
  • Performance metrics include mean, min, max, and relative speed.