Branchless Quicksort faster than std:sort and pdqsort with C and C++ API
12 hours ago
- #sorting algorithms
- #performance optimization
- #branchless programming
- Benchmarks compare sorting times for 50 million doubles across different implementations on Apple M1 and AMD Ryzen hardware.
- Blqsort outperforms std::sort and pdqsort, with branchless techniques and custom sorting networks improving speed.
- Four implementations are available: single-threaded and multi-threaded versions in C and C++, with easy integration.
- Branchless partitioning using an auxiliary buffer reduces mispredictions, enhancing performance for trivially copyable types.
- For complex data types, a BlockQuicksort variant is used to minimize swap operations, maintaining efficiency.
- Custom structs can be sorted flexibly, with blqsort showing significant speed improvements over standard sorts.