Stringwa.rs on GPUs: Databases and Bioinformatics
9 hours ago
- #CUDA
- #bioinformatics
- #string-processing
- StringZilla v4 is now CUDA-capable, making it fast on both CPUs and GPUs.
- The release includes GPU-accelerated string similarity kernels for Levenshtein distances and bioinformatics applications.
- New features include non-cryptographic hash functions, string PRNGs, and sorting algorithms for large string collections.
- Performance benchmarks show significant speed improvements over existing libraries like NLTK and cudf.
- The library supports dynamic dispatch for different architectures and is available for multiple programming languages.
- MinHash signatures are now computed using 52-bit integers, optimized for both CPU and GPU performance.
- StringZilla v4 also introduces high-throughput random string generation using AES instructions.
- The release includes optimizations for batch operations like sorting, with significant speedups over standard libraries.