Sort by distribution





In distribution sorts, the elements are distributed and redistributed into classes until the array is sorted.



In the most general case, this happens in approximately the same way. Elements are scattered across classes for some reason. If this does not lead to the ordering of the array, then the attributes of the class are refined and the elements are scattered again to the refined classes. And this happens until the array becomes ordered.



In sortings by distribution, there are almost always no comparisons of elements among themselves and their exchanges. The main thing is whether an element belongs to a certain class or not, its comparison with other elements rarely plays a role.



Typically, these sorts have linear time complexity (rather than logarithmic, as with efficient sorts of exchanges, merges, choices, or inserts). Also, the algorithms of this class almost always require a lot of additional memory, since the elements grouped by classes have to be stored somewhere.



Distribution sorts are good for ordering integers and strings. Sorting real numbers with them is usually inconvenient. Also, distribution sorts perfectly sort arrays consisting of repeating numbers - the more repetitions, the less different classes are required.

EDISON Software - web-development
This article was written with the support of EDISON Software.

Customer Opinion: 10 pluses of programmers from EDISON

This is interesting and useful to know: Breakfast programmer
Consider an algorithm that most convexly demonstrates the above properties.



Bucket sort :: Bucket sort



Other names are basket sorting, block sorting, pocket sorting.



We scatter numbers in baskets, then in each basket we scatter in smaller baskets, and so on, until at some level in the basket there are only the same elements. Then from such baskets of the lowest level it is easy to restore the array in an ordered state.



Let us explain with a specific example. Let's say we have an unordered array. It is known that in this array numbers from 1 to 8 are contained.







Bucket sort animation with time indicator








We spread these numbers into 2 groups: numbers from 1 to 4 fall into one group, from 5 to 8 into the second. Then we distribute the numbers in the first basket into two baskets: in one number 1 and 2, and in the other 3 and 4. We also distribute these baskets into bast baskets, which already contain numbers of the same size. To that large basket containing numbers from 5 to 8, we apply a similar recursion.



Then, from small baskets, each of which contains the same numbers, we return the elements to the main array in order of precedence.



Nuclear sorting in this form is not particularly applicable in practice, but it standardly demonstrates how all sortings by distribution work in general.



Thanos sort :: Thanos sort



Sometimes author sorts are sent to me and this is just such a case. The author Andrei Danilin called it โ€œRussian sorting in halves,โ€ however I dubbed it the sorting of Thanos. Or, if we formally proceed from the methods used, we can call its arithmetic mean sorting.







Thanos sort animation (with arithmetic mean distribution) with a time indicator








The arithmetic mean of the elements is calculated in the array and then all the elements are distributed into 2 groups. Elements smaller than (or equal to) the arithmetic mean go to one group, and larger than arithmetic mean to the second group. Then the same actions are applied recursively to both groups - and so on to the bitter end.



And what about crazy titanium? If this is a random array, then by and large the element, when compared with the arithmetic mean, has a 50/50 chance that it will go to one of two groups.



By the way, on the Internet I came across another comic algorithm with the same name. If the array is not sorted, then click the Infinity Glove and send randomly selected half of the array elements to nonexistence. If the survivors form an ordered array, then on this their great mission can be considered fulfilled. If not, you can make a few more clicks.







Thanos sort animation with time indicator








But back to our distribution sortings. All of them can be divided into only two groups - counting sorts and bitwise sorting . Well, if you want, you can also highlight counting-bit sortings , i.e. those that can be attributed to both.



There are also hybrid algorithms (these are those that use methods of different classes, for example, Tim's sorting is a mix of merge sorting and insertion sorting, introspective sorting is a quick sort that goes into a bunch sorting, etc.), including distribution sortings However, hybrids are a separate section. About them later.



Thousands sorting and arithmetic mean sorting are related to counting sorts.



Counting Sorts



The main idea is that we count how many numbers are in each class.



Counting sort :: Counting sort



We count how many times one or another number occurs in the array. Knowing these quantities, we quickly form an already ordered array.







Animation of sorting with a time indicator








For this sorting you need to know the minimum and maximum in the array. Then the keys are generated for the auxiliary array, in which we fix what and how many times it met.



Python Code:



def CountingSort(array, mn, mx): count = defaultdict(int) for i in array: count[i] += 1 result = [] for j in range(mn,mx+1): result += [j]* count[j] return result
      
      







Pigeon Sort :: Pigeonhole sort



We go through the array, if a new number is found, then we start the counter (as the key of the auxiliary list) of this number. If the number is encountered not for the first time, then the increment for this counter is simply triggered.







Pigeon Sort Animation with Time Indicator








The difference from the previous method is that in sorting by counting, we immediately start counters for all possible numbers that may occur in the array (we can afford it if the maximum and minimum in the array are known). Some numbers never occur and their counters show zero. In pigeon sorting, we start counters only for those numbers that really occur in the array. An array is used in counting sorting for counters, and a doubly linked list is used in pigeon sorting, which allows adding new counters on the go.



This method is sometimes alternatively called Dirichlet sorting , because the algorithm itself is an illustration of various consequences of the Dirichlet principle.

If N objects are distributed across M containers, and N> M, then at least one container contains more than one element.


Python Code:



 def PigeOnHoleSort(a): mi = min(a) size = max(a) - mi + 1 holes = [0] * size for x in a: holes[x - mi] += 1 i = 0 for count in range(size): while holes[count] > 0: holes[count] -= 1 a[i] = count + mi i += 1
      
      







Bit Sorting



We distribute the numbers depending on which digit is in a particular category of the number. If we do this several times for different digits, then we suddenly get a sorted array.



Low-order bitwise sorting :: LSD radix sort




Low-order sorting animation with a time indicator






We move from the lower digits to the older ones and at each iteration we distribute the elements of the array depending on what digit is in the digit.



After the next distribution, we return the elements to the main array in the order in which the elements fell into the classes during the next redistribution.



For bitwise sortings, it is important that the elements are treated as having the same number of digits. If the actual number of digits is different, then the problem is solved by adding additional zeros as higher digits.



High-order bitwise sorting :: MSD radix sort




High-order sorting animation with time indicator






First, we distribute among the senior ranks, from which we move to the younger ones.



This option is more difficult to implement, since the transition to the lower digits is recursively performed inside the classes, and not among all elements of the array.



But this complexity is rewarded by the fact that MSD is faster than LSD. When passing from the lower digits to the upper digits, you have to process all digits of all numbers in order to correctly sort. If you move from senior to younger, then in fact you do not have to process all the digits of all numbers, the sorted state, as a rule, comes earlier.



Most bitwise sortings are a variation of the more efficient MSD. This is especially useful for sorting strings; for this, a suffix tree is usually used. We will analyze in one of the following articles.



Counting bitwise sortings



Sometimes distribution sorting is simultaneously countable and bitwise.



Bead Sort :: Bead sort




Bead Sort Animation with Time Indicator






Other names of the algorithm: abacus sorting, gravity sorting.



I already wrote about this sorting a couple of times ( 1 , 2 ), so Iโ€™ll be brief, only the essence.



Suppose each number in an array is a set of balls, the number of balls is the value of a number. If we arrange the numbers on each other as horizontal rows of these balls and then push them all the way vertically, we get an ordered array.



The trick here is that we represent each number using balls in a unary number system. And in fact, we just count how many times all numbers have each digit.



BeadSort in Python in one line:



 #!/bin/python3 from itertools import zip_longest def beadsort(l): return list(map(sum, zip_longest(*[[1] * e for e in l], fillvalue=0)))
      
      







After a while, we will analyze more complex counting-bitwise sortings, among which the American Flag sorting occupies a prominent place.



References



Bucket / Buckets , Counting / Pigeonhole / Dirichlet , Radix / Discharges , Bead



Series Articles:



AlgoLab Excel application has been significantly updated. Some algorithms from today's article appeared there for the first time. Update who is using.



All Articles