An overengineered solution to `sort | uniq -c` with 25x throughput (hist)
6 months ago
- #performance
- #unique-lines
- #CLI
- A CLI tool named 'hist-rs' for counting unique lines with high throughput.
- Installation command: 'cargo install hist-rs'.
- Basic usage: 'hist <file>' to count unique lines in a file.
- Can read from stdin: '/bin/cat <file> | hist'.
- Options include: '-u' for unique lines, '-e' to exclude patterns, '-i' to include patterns.
- Threshold options: '-m' for minimum abundance, '-M' for maximum abundance.
- Sorting options: '-n' to sort by key, '-d' for descending order.
- Performance comparison with other tools shows 'hist' is the fastest.
- Tools compared: hist, cuniq, huniq, sortuniq, naive.
- Performance metrics include mean, min, max, and relative speed.