Hasty Briefsbeta

Bilingual

Branchless Quicksort faster than std:sort and pdqsort with C and C++ API

11 hours ago
  • #sorting algorithms
  • #performance optimization
  • #branchless programming
  • Benchmarks compare sorting times for 50 million doubles across different implementations on Apple M1 and AMD Ryzen hardware.
  • Blqsort outperforms std::sort and pdqsort, with branchless techniques and custom sorting networks improving speed.
  • Four implementations are available: single-threaded and multi-threaded versions in C and C++, with easy integration.
  • Branchless partitioning using an auxiliary buffer reduces mispredictions, enhancing performance for trivially copyable types.
  • For complex data types, a BlockQuicksort variant is used to minimize swap operations, maintaining efficiency.
  • Custom structs can be sorted flexibly, with blqsort showing significant speed improvements over standard sorts.