And more about sorts

And more about sorts



I would venture to raise this topic again. I'll start with a link to an article by Mikhail Opanasenko (oms7) , which is very impressive in terms of the amount of work done, as well as in the number of links cited. He began to prepare his material, not knowing about this publication, which subsequently, after familiarization, led to the need for its substantial processing. For those who have already read this article, I inform you that in my material, more diverse types of data are studied, in particular, strings and real numbers, the boost and bsd libraries are used, and some other topics that are missing from the article are mentioned.



There are dozens of different ways to arrange data items in order. Among them, those that work fast are distinguished, such that, for example, they can sort any data array located in the computer’s RAM in a maximum of minutes. More specifically, it can be said that quick sorting organizes a billion integers in a good modern personal computer in less than a hundred seconds. If you use primitive, non-quick methods, for example, bubble sorting or selection sorting, to sort a larger number of elements, then the time spent on such data processing can exceed any expectations - such “processing” can actually take several days, weeks, and even years. This big difference is caused by the fact that the sorting time by quick methods takes approximately proportional to N log N , and primitive - N 2 . With increasing N, the difference between the two values ​​becomes very noticeable. Therefore, it is reasonable to use primitive methods only for working with small data, for example, on modern computers, up to several thousand elements. It is also natural to use them for teaching the basics of programming and logical thinking, since they are much simpler than fast methods.



I would like to understand the sorting methods existing in the current standard libraries. Find out how big the difference between them is in terms of their main characteristics, speed of work, and also their characteristic features. In addition, we will consider along the way for comparison and exercises for the mind some methods that are not difficult to implement. It is also worth noting that the optimizer of the GCC compiler and possibly other good compilers works with sorts very well, speeding up the code several times (sometimes even more than 5 times).



Let's start with the bubble sort method as the simplest and slowest. By this method, you need to go through the data array over and over, comparing neighboring elements and changing their places if the order between them is broken. After each pass, at least one element (the largest or smallest - depends on the selected order) is in its place. In addition to simplicity, this method has one more advantage; it does not require additional memory. One more feature of the bubble method can be noted - it very quickly processes already ordered data and in some cases makes it one of the fastest methods. If the data is only partially ordered, then this method works with them faster, but in most cases only very slightly. For tests, I used the following implementation .



Another slow method is selection sort. Here, on each pass, first the largest and smallest elements in the data are found, and then these elements are placed in the extreme positions corresponding to the selected order. In the next pass, we sort the data without these extreme elements. This method is as simple as bubble sorting, and also does not require additional memory, but it is noticeably faster. Moreover, sorting by this method performs a record minimum number of permutations of data elements. Therefore, when permutations are much slower than comparisons, ordering by the selection method may be acceptable if the number of data elements is small. Here is my implementation . More often this sorting is realized, putting in place only one element per pass. Heap sorting (or pyramidal), which will be discussed later, is the most advanced version of the sorting in question.



The code for the last considered slow method, insertion sort, is probably the shortest of all codes that implement sorting, so this method is sometimes used by complex quick sorts for cases where the number of items to be sorted is small (several tens). It is somewhat similar to sorting by a bubble, since here and there the neighboring elements are successively compared. But insertion sorting looks for the next position for the next element in the already sorted part of the data, and not just pushes the extreme element to the extreme position. With this approach, additional memory is also not needed. Like bubble sorting, insertion sorting is very fast on ordered data and faster on partially ordered data. In the latter case, much faster than the bubble. Usually sorting by inserts is somewhat faster than sorting by selection. And unlike the latter, it, like bubble sorting, is stable. Worst of all, insertion sorting works with data in reverse order, with which it sometimes becomes the slowest of the slowest. For tests, the following implementation was used. It can be accelerated a little if you use not linear, but binary search, for example, using the std :: bsearch function. Significant acceleration can be achieved by using a list type structure, the insertion of an element into which is very fast. You can also notice that this is the most natural sorting - it, for example, is usually intuitively used when playing cards.



Shell sorting is the simplest among fast methods and is quite suitable for students just beginning to learn programming. It is only some modification of bubble sorting. The only difference between them is that in Shell sorting, the distance between the compared elements is taken varying from aisle to aisle, from the larger in the first pass, to 1 in the last, thus, in these last passes, the Shell method degenerates into a primitive sorting by a bubble. Donald Shell published the basic sorting algorithm that got his name in 1959. Thus, this is one of the first universal sortings that work fast. For comparison, the quick sort algorithm was published two years later, and Tim’s popular sorting or introspective sorting became known only in the 90s. Several interesting unsolved mathematical problems are associated with Shell sorting, the main one of which is how to optimally select the displacements between the compared elements. I managed to find some record sequences, for example, A102549 . Such sequences are found through colossal calculations, so they have a very short length, A102549 - this is only 8 elements, which is enough only for data up to about 3000 elements. For big data, sequels need to be looked almost at random. Used values ​​close to the powers of 2, e , 2.25 and 3. Prime numbers close to powers of 2 showed the worst results, noticeably inferior to the best. But the other three options turned out to be approximately the same in terms of impact on performance and probably very close to optimal. Moreover, in these three cases, the use of primes did not give tangible advantages. It is curious that the biases proposed on Wikipedia (with a base of 2.25) based on references to the corresponding works did not show the best results on the tests, although their differences from the best were very insignificant (no more than 5-10%). Using A102549 as a starting point also did not give any noticeable results. Mikhail Opanasenko also tried to unravel Shell sorting and obtained an interesting result that the displacements selected by the formula s n + 1 = 10s n / 3 give a very good effect and perhaps even close to ideal. My results confirm this. In many cases, it was such biases that gave the best result, although this was not always the case and the gap from the nearest result was quite small (about 5%). My code for implementing Shell sortings uses small tables with offsets, although if you do not use prime numbers, then these offsets for tables can be calculated almost instantly, as was done in the implementation of one of the given variants of this sorting.



Interestingly, if we take offsets close to the powers of the triples in a slightly different way and use a slightly different algorithm (see implementation ), then on 32-bit numbers we will get speeds close to the best, but on longer numbers and on the lines we will get a significant slowdown, sometimes more than 100%. The results for the best algorithm used by oms7 are also in the table below, but although it shows good results in order, it is significantly behind the leaders in absolute values.



Will there ever be a way to find the best offsets? Perhaps, but I dare to suggest that it is not soon. Shell sorting is used in the Linux kernel, and in at least one C library its code is used for the standard qsort () function. It has been theoretically proved that the Shell optimal sorting speed in order is only slightly slower than the “real” fast logarithmic methods. Indeed, the dependence of the average data processing time on their size for optimal Shell sorting is described by the formula ∽ N (log N / log log N ) 2 , which even for very large N is very close to the formula ∽ N log N typical for other fast methods. Usually, Shell sorting is often even faster than theoretically faster methods in order and begins to be a little inferior to them only when processing rather large arrays (of the order of 10 million elements). This sorting absolutely does not need additional memory and it behaves stably for a wide variety of options for filling data, comparing favorably with quick sortings. The Shell method does not possess the stability property.



Quick sort is only slightly more complex than the Shell algorithm and is still one of the fastest way to organize randomly scattered data. However, this sorting has several drawbacks. She needs additional memory and for very rare cases it works extremely slowly, according to the quadratic dependence. The main idea of ​​this method is to divide the data into two parts: the data in one part should be more or less (depends on the selected order) than in the other. There are several methods for this separation. Ideally, with each division, both parts should be approximately the same in size, and worst of all, when one of the parts turns out to be composed of only one element during division. Let us consider several implementations of quick sorting algorithms, in particular, the Hoar method , in which a reference element dividing data into two parts is selected from the middle of the sorted data.



We also consider the extremely compact Lomuto algorithm , which is sometimes slightly (about 1%) faster than the Hoare method considered. However, on typical special cases, for example, on ordered, inverse, or malovariant data, the Lomuto method shows extreme slowness. In addition, among the options considered for quick sorting, this turned out to be the most greedy for stack size during practical runs: when sorting relatively small arrays, only this sorting did not have enough 8 megabytes for the stack, I had to set this size through ulimit more. Such greed for the stack leads to large slowdowns in processing large data (tens of millions of lines), and I’m having difficulty calling its nature. I can only state that it is better not to use this sorting from the next paragraph with such data.



The Lomuto method selects the last element as the reference one, but it is possible to implement quick sorting without the support element at all, more precisely, the selection of such an element here occurs as a result of the already performed data bisection. This sorting by speed characteristics turned out to be close to the Lomuto method, although it is usually a little faster, and in extreme cases it is noticeably faster than Lomuto, but slower than Hoar.



In 2009, a two-anchor quick-sort algorithm was published, which became standard for the Java language. This algorithm reduces the number of permutations by 20% compared to the best typical ones, but the number of comparisons does not change. Its author is Vladimir Yaroslavsky. It really works, as a rule, faster than other quick sorts. I optimized it a bit, using the long-known fact that on x86 architecture, swap usually works faster than assignment, and for C ++ strings it is much, much faster. All quick sortings considered so far do not have the stability property.



Additional memory for quick sorting is needed to organize recursive calls. However, the second such call can be replaced by a loop, by optimizing the tail recursion , which in terms of speed may not yield any gains, but significantly reduces the size of the additional data used. I implemented the Hoar sort option with this kind of optimization. In addition, in system programs, you can check the stack pointer and if it approaches a critical value, you can simply reset all recursive calls and start sorting again - for this case it is obvious that you need to use the quick sort option that does not slow down on almost ordered data, for example , the above proposed version of Hoar. Fighting with the use of additional memory can be considered the main idea of ​​quick sorting from the standard C language library in GCC. It generally abandoned recursion. Instead, they use her simulation, which allows a third to reduce the load on the stack. The code turned out rather big, about 150 lines. About this sorting there will still be a little material below.



Hash sorting can be very fast, close to ∽ N. However, sometimes it can work on a quadratic dependence. The speed of this sorting method is very dependent on the input. If the data is evenly distributed by the hash function over the auxiliary array, then we get the fastest linear relationship. And if all the data is grouped near several "centers of mass" far apart or when there are many identical data elements, that is, when a lot of hash collisions happen, then we get the worst type N 2 dependence. As with tree sorting, to sort the hash, you need quite a lot of additional data, in the code listing below you need, for example, 12 additional bytes for each sortable integer (int32, x86-64). An interesting property of hash sorting is the absence of comparison operations between data elements, which distinguishes this sorting from all those considered above. More precisely, these operations are needed only for collisions. When sorting data, where the key matches the entire data element, you can use an additional counter for the number of identical elements, but this is rather a dubious optimization. You can also use the binary tree instead of the list for storing hash collision data, this greatly speeds up work for individual particular cases when there are many collisions, but in general, when using the binary tree, in many cases it slows down and this is despite the fact that data has to store almost 100 bytes of additional information. I implemented three options for hash sorting using a binary tree: one uses an unordered tree, and the other two use standard trees from the std and boost libraries. Hash sorting is practically unsuitable for sorting text strings, except for very short ones, since it is impossible to make a good hash function for such data. I could not adapt the standard C ++ hash (unordered_multiset) for sorting: I tried to use monotone hash functions and ordering relationships instead of equality - this did not work.



Array sorting is very similar to the previous one. An auxiliary array is also used, where values ​​are entered by the hash function. In the event of a collision, you need to shift the continuous fragment of the occupied elements to the left or right position, freeing the position indicated by the hash function for the new element. To obtain good speed, it is necessary that the auxiliary array be several times (from 2-3) more than the original one. With an increase in the size of the auxiliary array, the operation speed increases only to a certain limit, depending on the sorted data and the hash function associated with them, and then (typically from 4-5) decreases. The speed of operation is about the same as that of the hash, but on good data a little faster, and on bad data it is noticeably slower. This sort also needs a lot of extra memory.If you limit the number of elements in the sorted array to a little more than four billion, then a triple auxiliary array will require the same amount of additional data as sorting with a hash, and a tripled one will require 28 bytes, which is noticeably less than for sorting by a tree, or much less than a hash with trees. This sorting is also almost unsuitable for working with strings. There is no Wikipedia article about such an algorithm, but here is my implementation .



Interestingly, in Wikipedia, in a good overview article, there is no mention of such intermediate methods as array sorting and hash, which can naturally be placed between methods based on comparing elements and methods based on the absolute value of elements.



One of the fastest sortings, which never use comparisons at all, is the bitwise sorting known since the 19th century.(radix sort). Her idea is very simple - you need to work with groups of bits of data representation (for tests I took groups of 8, 11 and 16 bits). Tables are built for each group, and the results are then combined in a relatively simple way. There are two main ways to use bitwise sorting. It is more convenient to take digits for sorting numbers from right to left (this is the LSD - Least Significant Digit option), and for sorting strings from left to right (this is MSD - Most Significant Digit option). Bitwise sorting is often significantly faster than any other data ordering method. Surprisingly, the support for bitwise sorting is still not very significant: it is neither in boost nor in the standard C ++ library, I don’t even know its version for any well-known library for working with C ++ numbers or strings. This sort hasof course, and disadvantages. It is very sensitive to the type of data for sorting, for example, you need to have your own version of such sorting for data of each size, you need to make special options for unsigned and signed integers, and support for working with real numbers can require quite a bit of effort. When using the order from the least significant byte to the highest, its variant usually requires additional memory, slightly larger than for the original data (this is significantly less than for sorting by a hash or an array, and even more so by a tree). In addition, this option is of little use for sorting long strings. My code for this sortyou need to make special options for unsigned and signed integers, and support for working with real numbers can require quite a bit of effort. When using the order from the least significant byte to the highest, its variant usually requires additional memory, slightly larger than for the original data (this is significantly less than for sorting by a hash or an array, and even more so by a tree). In addition, this option is of little use for sorting long strings. My code for this sortyou need to make special options for unsigned and signed integers, and support for working with real numbers can require quite a bit of effort. When using the order from the least significant byte to the highest, its variant usually requires additional memory, slightly larger than for the original data (this is significantly less than for sorting by a hash or an array, and even more so by a tree). In addition, this option is of little use for sorting long strings. My code for this sortThis option is not suitable for sorting long strings. My code for this sortThis option is not suitable for sorting long strings. My code for this sorthere , it is based on the code from the oms7 article mentioned. The reverse byte order option is more versatile and very well suited for sorting strings. This option can be implemented without the use of additional memory (the price for this is the loss of the stability property), as is done in the radixsort () function of the bsd library. My codefor this option it is also based on the oms7 code, it uses extra memory, a bit larger source data, has the stability property, but is not optimized for strings and therefore shows significantly worse performance characteristics than the similar function sradixsort () from the already mentioned bsd library . This sorting can show surprisingly bad results when working with small numerical arrays, working several orders of magnitude slower than even bubble, although we are talking about very small values ​​of no more than a few milliseconds and this difference is not easy to notice. This is due to the fact that it uses auxiliary arrays of small size, but when sorting data of small size, these small sizes may be larger than the sorted data itself.To avoid slowdowns, the “left to right” option uses insertion sorting instead of the main one in such cases. In conclusion, it is worth noting that this is the only relatively popular sorting known to me that always reliably works at a speed of ∽N , but the proportionality coefficient here depends on the size of the data elements and for strings or long numbers it can be quite noticeable.



An option for bitwise MSD sorting is beam sorting , a data structure that allows you to efficiently place the keys of an associative array. My implementation , despite optimizing the use of memory, still turned out to be very greedy for it. By speed, the best results were obtained when sorting long lines.



Further we will consider some sortings which can be found in standard libraries.



Let's start with the quick one from the standard C library (qsort, a variant of GCC), I already wrote about it. I can only add here that this sorting as well as other C-sortings (for example, the following from the BSD library) are unsuitable for working with object data, in particular, C ++ strings, which is caused by the fact that such data is not POD. Having the source, the problem is easy to solve by replacing memcpy operations with regular assignments. You may also notice that in some standard C libraries, this sorting may not necessarily be fast, it can be replaced with others. In the current version for GCC, this sorting even has the stability property. There were sometimes surprises with the si-sortings mentioned when collecting data, for example, when working with the std :: vector type through a functional object, they could create difficulties - I can recommend using it with object data with caution. According to the runs, this sorting is sometimes relatively slow: it is noticeably inferior in speed to other implementations of fast sorting when working with numbers, but when working with si-strings it is better, only sorting with two control points sometimes turns it ahead,but on long lines, the standard qsort almost always overtakes it. The most interesting thing was discovered when I tried to sort a billion integers with its help - it turned out that filling in type 7 leads to a time dependence close to a quadratic law, that is, to a possible "processing" lasting up to several years (I did not wait for the end and stopped it at 21 hours of run). On lesser data, this sorting can usually select anchor points that it works quickly.On lesser data, this sorting can usually select anchor points that it works quickly.On lesser data, this sorting can usually select anchor points that it works quickly.



Introspective sorting is used in the C ++ standard library, although the exact method used in std :: sort is implementation dependent, provided information only on GCC. According to the runs, this is the second fastest after spread sorting when working with numbers, and the advantage of spread sorting is small (from almost 0 to 30%), but with string sorting everything is much worse - it can be 3-4 times lower than the leaders . This is actually a quick sort, in which two special cases are taken into account: 1) if the number of recursions has become too large, then switching to sorting by heap occurs; 2) if the number of items to sort is small, then switching to sorting by inserts occurs.



Stable sorting from the C ++ standard library ( std :: stable_sort), as the name implies, has the property of stability - it preserves the relative order between elements with the same key. This property is relatively rarely necessary, although I write about it rather unfounded, only on the basis of my own experience. It can use additional memory, which makes it faster. Surprisingly, this sorting is often faster than std :: sort.



In the super-popular language python, Tim's sorting is used as standard . For tests, I used its version from the github repository. It shows record-breaking good results on partially-ordered data, but on average it is still noticeably slower than leaders. Usually its speed is the average between fast sorting and Shell sorting, although on the lines it is sometimes close to the leaders. It has the property of stability. It implements a relatively complicated algorithm, in the standard implementation of which an error was discovered in 2015, which, however, requires a rather unrealistic situation for its manifestation.



The BSD C library has bitwise sorting ( radixsort) and its stable version (sradixsort). Unfortunately, both of these sorts can only be used for C-strings. As will be seen from the test data, today it is the fastest way to sort strings, and therefore it is surprising that there is no standard option for C ++ strings.



The BSD C library has more sort merge ( the mergesort) This sorting is known as one of the fastest for sequential access data (files, lists) and is probably used in the C ++ standard library for sorting lists (std :: list and std :: forward_list). By the way, she was known since 1948 and one of its developers was a very well-known mathematician and specialist in the first computer systems von Neumann. Of the quick methods, this sorting is not distinguished by the best characteristics, although, as a rule, it is somewhat faster than the Shell methods. It requires additional memory and is usually implemented sustainable.



In addition, there is still sorting by a bunch(heapsort). The heap is usually used to optimize queuing with priorities, but can also be used for sorting. Sort heaps do not require additional memory, but they do not have the stability property. In speed for numbers, it is significantly (up to 3-6 times) slower than the Shell methods, but for not very short lines of lines it shows very good results, overtaking (with an increase in the length of the line, the advantage grows) Shell methods.



Heap sorting is also available in the C ++ standard library. Such sorting is done in two operations: building the heap (std :: make_heap) and then actually sorting ( std :: sort_heap) Here, unlike the bsd library, sorting is just one of the operations for the heap. Usually, this sorting option is slightly faster than the previous one (the bsd option shows better results only on short numbers and long s-lines).



Using the standard C ++ library, you can sort the binary balanced tree (std :: multiset) - just fill the tree and then go around. This method can be considered non-recursive quick sort. Some problem arises in the fact that the standard memory allocator is notable for being slow, therefore, for the best results, you need to use your own allocator, which speeds up by about 10-30%. It can also be noted that this method requires a lot of additional memory, with g ++ for each data element, in addition to it, you also need to store 32 bytes (on the x86-64 architecture) - it would be interesting to try to store such a tree as a heap, i.e. without additional byte. If you use boost :: container :: multiset, you need less memory: only an additional 24 bytes per data element. However, like boost,and the standard library showed one unpleasant surprise - in the process they sometimes required more memory than was necessary. Perhaps this is due to the balancing of binary trees. Codes -here .



The boost library has spreadsort , an algorithm that was invented in the 21st century. This is the fastest overall method available today in well-known libraries. This sorting uses some bitwise ideas and, like it, can be quite moody about the type of arguments. Usually this sorting shows record results, sometimes significantly better than those of the closest competitors. The only exception is the sorting of C-lines, where it is significantly inferior to the bitwise methods from the bsd library. When sorting long C-lines, it can be inferior to other methods, for example, spin-sorting or quick sorting with two anchor points. Spread sorting (boost v1.62) showed a very nasty problem- when sorting small (up to 1000 elements) C-string arrays, it works with errors.



There is also a new pdqsort algorithm that improves, as stated by the author, introspective sorting. This new algorithm, which is not yet described on Wikipedia. Its results, although not bad, are not very impressive. It is slower than std :: sort on short integers, but faster on strings and long integers. In both cases, the difference is rather insignificant. The best results for this sorting were obtained for long C ++ strings - here it is inferior, although noticeable, only to the leader, spread-sorting.



In boost you can still find spinsort. This is also a new algorithm, which, unlike the previous one, has the stability property and which is also not yet described on Wikipedia. Usually he is close to the leader, but with a noticeable lag behind him. It requires, although not too much, additional memory. Let's end



with flat_stable_sort from the same boost library. This is another new robust algorithm that is not yet described on Wikipedia. This is by far the fastest method, but slightly inferior to most other fast library methods. It uses very little additional memory (however, it always needs a fixed-size table of 8 KB) and is often noticeably faster than the Shell method.



Consider the tabletime (in ms) of the operation of these algorithms on a computer with 8 GB of RAM with an AMD Phenom ™ II X4 955 @ 3.214 MHz processor. The computer worked for a total of several months, and the total size of the collected data in two json files that are loaded with tables is almost 400 KB. The timings are given by the average of the number of runs; for smaller sizes, these runs were larger. Working with the cache in a rather complicated way changes the speed of calculations, so the results obtained are only approximate at best (I can assume that timing inaccuracies can reach up to 20%). I believe that on the best modern processors for PCs, the result can be obtained 2-3 times faster, but it should be borne in mind that many more modern processors work by switching between different frequencies and the result obtained with them,will be even more approximate.



This and the following table are interactive. In addition to the absolute values ​​of the timings, you can also see their values ​​relative to the average, median, minimum and maximum. You can change the accuracy in the characters. You can also get timing relationships for different types of fillings and data types. The latter, for example, may show that sorting C-strings is noticeably faster than C ++ strings. From sorting methods, you can also select and assemble a variety of subsets. You can, of course, set sorting by any column. Unfortunately, I do not know how to use Javascript in the article on the hub, so the tables are available only by reference. In case github.io turns out to be overloaded, I also give backup links to the first and second tables.



Time is measured in milliseconds, but in the law of time dependence, in order to avoid too small coefficients, formulas for microseconds are given. Thus, if we substitute the value for N in the formula , then the result must also be divided by 1000 to get a number close to the corresponding one from the table. The law of time dependence is derived on the basis of the obtained timings, from a comparison of the two results (usually the extreme ones are taken). You can check the quality of the derived law using the option of relative deviation of the actual value from the output.



Some general conclusions from the results of this table:







:







Now it's time to talk about the types of data used with sorting algorithms:



  1. , 32- (int32_t), . – , ;
  2. , 64- (int64_t);
  3. , 128- (__int128 – , , GCC);
  4. (int32_t), (INT1P4). , ;
  5. , double (float numbers);
  6. C++ . 1 16 (strings short c-strings short);
  7. ++ , 1 256 (strings c-strings);
  8. ++, 1 2 20 ( ), , 512, – , 1 512 (strings long c-strings long).


:







  1. ;
  2. (, ordered);
  3. ( , reversed);
  4. 0 99 ( , low variation 100);
  5. 0 1 ( , low variation 2);
  6. 0 ( , low variation 1);
  7. , qsort (Hoare) . , 2 N -3 N ;
  8. , 1% (partially ordered);
  9. , 1% (partially reversed).


It should be emphasized that random data is the most typical case of filling an array, all other methods are extremely rare and even almost impossible during normal operation of a particular.



Let's look at the test results, where sortings work with all possible data sequences. The number of such sequences is equal to the factorial of their length, therefore, for sequences of length 12 there are 479'001'600 variants - a good modern PC will calculate their number in less than a minute. If we take sequences of length 14, we already get 87'178'291'200 variants for several hours of computer operation. Therefore, the following table shows the average time (in processor cycles obtained through the RDTSC instruction) of one sorting when sorting all permutations up to only 12 in length. The previous numeric types and short strings are taken as data. Of course, one could notice that sequences with repeating elements are not considered. However, I dare to suggest that their presence would not qualitatively change the results, but could significantly slow down their receipt.



The results for such small data are not very representative, and especially for complex sorting methods, but they still add to the idea of ​​sorting behavior. Some sorts, as far as I know, replace their main algorithm with another when working with small arrays - these are spread sorts, fast with two anchor points and radix_msd (the last two use inserts). And some sortings (flat_stable and radix) use small tables, but with tiny data sizes, these tables turn out to be much larger than the data itself, which greatly slows down these methods compared to others and produces strange results. Strange results are also obtained with other bitwise sortings and with hash and array sortings. Such unusual results are easily explained bythat the data preparation time before sorting for these methods for small data is longer than the sorting time itself. Of course, when measuring such small time intervals (nanoseconds), the influence of various errors on the displayed law is much higher than in the previous table. Therefore, the laws turned out to be very approximate, often "with a drift" to exaggerated values. The latter is partially explained by the fact that when working with small data the sorting time itself becomes comparable to the time of calling the sorting function and several necessary auxiliary operations for measuring time. The program tries to subtract the named overhead from the output, but it turns out to be done rather approximately. With all this, I dare to assume thatcomparing the results for different types of data and taking into account the comments made, sometimes you can make assumptions that are not very far from accurate.



In conclusion, another table that shows how much is required by different test methods for sorting additional memory. Obviously, this value depends on the system. In my tests, as I already wrote, this is x86-64, GCC. The letter T in it means the size of the type in bytes (the length of the string is not included in this size: for C lines, this is the size of the pointer, for C ++ lines, this is the size of the descriptor, 32 bytes for x86-64 GCC), the letter L is the middle the length of the type in bytes (for numbers this is T, and for strings it is the average length of the string), the letter A can have the value 1 or 0 - this is alignment to the 64-bit border, and the letter M is the alignment from the standard memory allocator (it is assumed aligns to a 32-byte boundary). Icon * means that the data on this type of sorting was obtained only on the basis of the analysis of reading the VmRSS field from / proc / PID / status (the mentioned field is the size of the process program).



Additional memory table
Method Dependence
array * 1 (T + 1/8) N
array * k, k> 1 (T + 4k) N
bubble 0
clib_qsort ≈T N / 2 to ≈T N *
flat_stable ≈T N / 256
hash (T + 8 + 4A) N
hashbt (T + 12) N
hashbt_boost (56 + T + 4A + M) N
hashbt_std (80 + T + 4A + M) N
heapsort 0
insertion 0
mergesort_bsd ≈Tlog 2 N T N *
pdq Tlog N
quicksort ≈16log 2 N 16 N
quicksort_tco 0 N
radix ≈T N
radix8_trie ≈T N + 24L ≈(T + 24L + 12) N
radix_bsd 0
radix_msd ≈T N
selection 0
shell 0
spin T N /2
spread ≈0
sradix_bsd ≈T N *
stlsort 0 ≈Tlog 2 N *
stlstable 0 ≈T N /2 *
timsort 0 ≈T N *
tree_boost (T + 24) N
tree_stl (T + 32) N




There are, of course, other sorting methods, both primitive and fast. The boost library has parallel algorithms that allow you to take advantage of the presence of additional processor cores in the system. You can also use the self-organizing container boost :: container :: flat_multiset instead of std :: multiset, but it works very slowly.



I take this opportunity to say a few comments about the boost library in general. I recommend not to pass by. Even those features that are in the standard library in boost, as a rule, are better implemented, and sometimes (such as regular expressions) are much better. If we talk about containers, then in boost they are noticeably more, and those that coincide with the standard ones are sometimes somewhat faster and often have small, but pleasant improvements. Boost checks types more thoroughly, which can sometimes help with the detection of almost elusive errors that usually do not manifest themselves, but in some circumstances can be activated unexpectedly. The disadvantages of boost include unconditionally completely unreadable and huge in volume messages about compilation errors on many constructions from this library - this, although to a lesser extent, applies to the standard library. It's time for C ++ developers to do something about this.



All files with tests and some other related materials can be taken from my repository . If someone is interested in raw source data, then you can get it here (1.4 MB). I will be glad to any comments, criticisms and additions.



All Articles