Show HN: blqsort – Fast Branchless Quicksort with C++ Interface
blqsort is a new fast branchless quicksort implementation for C++ that surpasses std::sort and pdqsort in performance on random datasets. It employs techniques to avoid branch misprediction, which is crucial for optimizing execution times on modern CPUs. The implementation shows significant speed improvements in sorting large datasets, particularly on specific hardware configurations.
- ▪blqsort outperforms std::sort and pdqsort when sorting 50 million doubles on both Apple M1 and AMD Ryzen systems.
- ▪The algorithm uses an auxiliary buffer for branchless partitioning, inspired by fluxsort, to enhance performance.
- ▪For types with higher copy costs, a BlockQuicksort variant is utilized to maintain efficiency.
Opening excerpt (first ~120 words) tap to expand
blqsort blqsort is a fast branchless quicksort implementation for C++ that outperforms std::sort and pdqsort on random datasets. On modern CPUs, avoiding branch misprediction is a key technique to speed up programs: When 'if' slows you down, avoid it. Performance results naturally depend on the underlying hardware. The following benchmarks show the execution times for sorting 50 million doubles using different sorting implementations. The measurements were taken on an Apple M1 system using Clang and on an AMD Ryzen 3 system using GCC, both compiled with the -O3 option. Implementation Apple M1 AMD Ryzen std::sort 1.33s 5.56s pdqsort 1.33s 2.81s blqsort 1.01s 2.06s This paper by Edelkamp and A.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.