How the strange popcount instruction is used in modern processors

This is the pseudo decoding of my presentation at !! Con 2019 .



Most processor architectures in use today have instructions called popcount



, short for 'population count'. She does the following: counts the number of bits set in a machine word. For example (take 8-bit words for simplicity), popcount(00100110)



is 3, and popcount(01100000)



is 2.



It can surprise you a lot, like me, but that's all she does! Seems not very helpful, right?



I thought this was some recent addition for some hyperspecialized use cases, but it has actually been present in processor architectures since at least 1961:





So what is going on?

NSA instruction



popcount



also known as the “NSA instruction,” and a very interesting thread on comp.arch discusses its use in cryptography. Rumor has it that it was originally added to the CPU instruction set at the request of the NSA. As stated in this archived mail thread :



It was almost a tradition to send one of each batch of faster CDC cars to a “good customer” - an unknown truck came and they never heard of it again.


A great legend, but why did they use it?



One measure of content is Hamming's weight , which is the number of non-zero characters in a string. For a binary string, this is popcount



!



As explained here , the NSA required cryptanalysis of intercepted messages, and since the CDC 6000 worked with 60-bit words, one word was enough to store most of the alphabets that interested them. They were able to:



  1. Split message into lines

  2. Set a bit for each unique character in a string

  3. Use popcount



    to count the number of different characters

  4. Use the counter as a hash for further cryptanalysis


Curiously, popcount



seems to have disappeared from the instruction sets between the mid-1970s and mid-2000s, so the return should be explained by something other than cryptographic applications. What else can it be used for?



Error correction



The concept of Hamming weight is associated with the Hamming distance , which is the number of different positions between two lines of the same length. For two binary strings x



and y



, this is just popcount



after XOR. For example:



  00100110
 01100000 ^
 --------
 01000110

 popcount (01000110) = 3 


In telecommunication applications, this helps to calculate the signal distance, where a known word is transmitted along the wire and the number of changed bits is counted in order to estimate the transmission error.



Then we can design the appropriate error correction code . For example, if a transmission must withstand up to two changed bits, then the code words should differ by at least 5 by the Hamming distance.



Binary convolutional neural networks



And now something completely different: binary convolutional neural networks! But first, what is it?





Thus, we must perform the multiplication of binary matrices. But what is special about binary matrices?



Conventional matrix multiplication by 32-bit values ​​is well suited for desktop computers with powerful CPUs and GPUs, but increasingly we want to do useful work on small and simple devices such as smartphones, routers, smart watches, etc. We can decompose these more complex matrices for layers of binary matrices, and it’s so easier to work with them and store them that we benefit even despite an increase in the number of layers.



This is popcount



comes into play. It is used to calculate the scalar product of two binary matrices:



  a = xnor (x, y)
 b = popcount (a)
 c = len (a)
 dot (x, y) = 2 × b - c 


See here and here for more details.



Chess programming



Many chess programs store data in a bitboard representation, which conveniently fits into a 64-bit word. The Population Count



operation was used for meaningful operations with this view, such as calculating the mobility of a figure.



Molecular fingerprinting



This is also related to the Hamming distance: the molecules are somehow hashed and compared (using popcount



) to determine how similar they are. See here for more details.



Hash array mapped tries (HAMT)



This is where I first found out about popcount



! HAMT is a data structure ( first created by Phil Bagwell ) that can store a very large number of values ​​(usually 32 or 64) in an array on each trie node. However, allocating memory for a 32 or 64 element array can be incredibly wasteful every time, especially if the array actually contains only a few elements. The solution is to add a bitmask in which the number of bits set corresponds to the number of elements in the array, which allows the array to grow and contract as needed. The index calculation for a given element can effectively be done using popcount



. In my blog post on implementing HAMT structures, you can learn more about how they work.



Compressed Data Structures



This is an exciting new area of ​​research that focuses on how to store data in minimal space without unpacking it to do useful work. One of the methods is to think in terms of arrays of bits (bit vectors) that can be requested in two operations:





To make these operations efficient on large bit vectors, you need to build an index and use it effectively, in both cases involving popcount



. Here is a good overview of the RRR index. And, as far as I can tell, the most advanced modern approach is described in the article Space-Efficient, High-Performance Rank & Select Structures on Uncompressed Bit Sequences .



Compiler optimizations



popcount



has become so widespread that both GCC and Clang are able to detect it and replace it with a built-in instruction. Imagine this Clippy: “Oh, I see that you are trying to implement popcount



, let me go out and fix it for you!” The corresponding LLVM code is here . Daniel Lemyr cites it as an example of the amazing mind of modern compilers.



Output



Shrouded in mystery at the beginning of its history, the popcount



instruction popcount



be used everywhere, although it remained a bit unusual CPU instruction. I like the way it connects such different areas of computer science, and I wonder how many more other such strange instructions exist. If you have your own favorite, I would like to hear about her!



All Articles