SIMD-friendly algorithms for substring searching
18 hours ago
- #performance-optimization
- #substring-search
- #SIMD
- Popular programming languages provide methods or functions to locate a substring in a given string, such as strstr in C, find in C++'s std::string, and pos/index in Python.
- Substring search algorithms can be categorized into two major groups: those based on Deterministic Finite Automaton (e.g., Knuth-Morris-Pratt, Boyer Moore) and those based on simple comparison (e.g., Karp-Rabin algorithm).
- Standard algorithms assume that comparing characters is cheap, but modern CPUs do not meet this assumption, leading to the exploration of SIMD-based approaches for substring search.
- Two SIMD-friendly approaches are discussed: a modification of the Rabin-Karp algorithm and an SSE4 string search method, both aiming to improve performance by leveraging parallel processing.
- The Karp-Rabin algorithm uses hash comparisons to identify potential substring matches, with SIMD solutions replacing hash predicates with vector predicates for faster parallel processing.
- A SWAR (SIMD Within A Register) approach is described, utilizing bitwise operations to locate substring matches without dedicated SIMD instructions.
- AVX2 and SSE implementations are compared, showing AVX2's superior performance due to its wider registers and efficient instruction set.
- ARM Neon and AArch64 implementations are discussed, highlighting the challenges and optimizations specific to ARM architectures.
- SSE4.1 and AVX2's MPSADBW instruction is explored for substring search, though it faces limitations such as quadratic complexity in certain scenarios.
- SSE4.2's String and Text New Instructions (STNI) are introduced, but their performance is limited by high latency and lack of support in newer processors.
- Performance measurements across various architectures (Westmere, Bulldozer, Haswell, Skylake, KNL, ARMv7, ARMv8) show significant speed-ups for SIMD implementations over traditional methods like strstr.
- The article concludes with acknowledgments and a reference to the GitHub repository containing all implementations and test programs.