"5 cents" to talk about Sorts

In continuation of the topic I want to share my code, which overtakes std::sort()



from the current versions of the GNU C ++ Library and (approximately, there is no exact data) repeats the result of "Sort Alexandrescu" with CppCon 2019 .







Task conditions



In



(non- C++



) C++



it took a reasonable effort to get the sort no worse than std::sort()



to get rid of the overhead of using the library qsort()



. Including, therefore, use macros instead of templates.

In turn, if you sort the “mice” and not the “elephants”, then the costs of qsort()



quite large: extra address arithmetic and an indirect call to the comparator function.







Result



According to the available information, this combination of algorithms and implementation features is faster than many other options in a practical sense:









It is very likely that this option is slightly faster and / or slightly slower than the vast majority of sorts, but finding out is literally a titanic work that I can not afford.







It is interesting if someone compares this option with the current options in Tarantool, PostgreSQL, SQLite and MySQL. I hope kaamos will not be able to pass by with its SysBench .







How is Alexandrescu?



In the first comment from RPG18, there was a link to a recent performance by Andrei Alexandrescu “Speed ​​Is Found In The Minds of People" , where he brings to a fairly similar idea, but goes to heap_sort closer to the final.







The performance seemed a little prolonged to me (if olegbunin had given 90 minutes at least once ...), but the numbers are not enough. In particular, I want to see the sorting behavior with increasing N



, since an increase in the QuickSort completion threshold leads to acceleration at large sizes and deceleration to small ones, etc.







Nevertheless, judging by the figures that Alexandrescu cites, the described option suddenly gives a similar acceleration. However, until I found the code shown to Alexandrescu in its finished form, to "take and compare", and there is no time to encode by video (write if you find or do it).







Ideological side



The theoretical and ideological side of the "algorithm" is quite simple:







  1. For non-short sequences, we use QuickSort with all the acceptable optimizations:

    • Not recursively using the internal stack of positions on pointers.
    • As a supporting element, we use the median of the first, middle, and last elements.
    • We do not sort small portions, we leave it for ShellSort.
    • After splitting, we always put the largest part on the stack, as a result, the stack cannot be deeper than Log2(N)



      .
  2. Add-sort data using ShellSort:

    • minimum number of passes.
    • the step on the first pass is correlated with the maximum size of the unsorted segment.
    • total only two passes with steps 8 and (inevitably) 1.
  3. Using ShellSort allows you to relatively safely increase the exit threshold for QuckSort. As a result, we have a combination of one of the best QuickSort options with savings due to an earlier exit and slightly faster pre-sorting.


It is worth noting that depending on the processor architecture and application conditions, you can increase the gain on long sequences up to 10-15% by choosing SORT_THRESHOLD



within 128-256. However, this slows down the processing of sequences with a reverse order and close to it.

Nevertheless, this is a good bonus if you understand that the reverse order is unlikely in your data, or if you can cheaply detect such cases and execute a branch with a small SORT_THRESHOLD



.







PS

The described sorting option was “redone” for my libmdbx project (fast embedded key-value database with ACID), in which the other day README and API description were updated (actually rewritten). Therefore, I will be grateful both for correcting typos and for advice and suggestions. Himself, as a rule, is not visible lack of some information.



All Articles