This is the pseudo decoding of my presentation at !! Con 2019 .
Most processor architectures in use today have instructions called
popcount
, short for 'population count'. She does the following: counts the number of bits set in a machine word. For example (take 8-bit words for simplicity),
popcount(00100110)
is 3, and
popcount(01100000)
is 2.
It can surprise you a lot, like me, but that's all she does! Seems not very helpful, right?
I thought this was some recent addition for some hyperspecialized use cases, but it has actually been present in processor architectures since at least 1961:
So what is going on?
NSA instruction
popcount
also known as the “NSA instruction,” and a
very interesting thread on comp.arch discusses its use in cryptography. Rumor has it that it was originally added to the CPU instruction set at the request of the NSA. As stated in
this archived mail thread :
It was almost a tradition to send one of each batch of faster CDC cars to a “good customer” - an unknown truck came and they never heard of it again.
A great legend, but why did they use it?
One measure of content is
Hamming's weight , which is the number of non-zero characters in a string. For a binary string, this is
popcount
!
As
explained here , the NSA required cryptanalysis of intercepted messages, and since the CDC 6000 worked with 60-bit words, one word was enough to store most of the alphabets that interested them. They were able to:
- Split message into lines
- Set a bit for each unique character in a string
- Use
popcount
to count the number of different characters
- Use the counter as a hash for further cryptanalysis
Curiously,
popcount
seems to have disappeared from the instruction sets between the mid-1970s and mid-2000s, so the return should be explained by something other than cryptographic applications. What else can it be used for?
Error correction
The concept of Hamming weight is associated with the
Hamming distance , which is the number of different positions between two lines of the same length. For two binary strings
x
and
y
, this is just
popcount
after XOR. For example:
00100110
01100000 ^
--------
01000110
popcount (01000110) = 3
In telecommunication applications, this helps to calculate the signal distance, where a known word is transmitted along the wire and the number of changed bits is counted in order to estimate the transmission error.
Then we can design the appropriate
error correction code . For example, if a transmission must withstand up to two changed bits, then the code words should differ by at least 5 by the Hamming distance.
Binary convolutional neural networks
And now something completely different: binary convolutional neural networks! But first, what is it?
- Binary means that we only use matrices of the values +1 (encoded as 1) and -1 (encoded as 0), unlike 32-bit floating point values.
- Does convolution mean matrix multiplication?
- Neural networks are systems inspired by the brains of animals (here I am swimming a little).
Thus, we must perform the multiplication of binary matrices. But what is special about binary matrices?
Conventional matrix multiplication by 32-bit values is well suited for desktop computers with powerful CPUs and GPUs, but increasingly we want to do useful work on small and simple devices such as smartphones, routers, smart watches, etc. We can decompose these more complex matrices for layers of binary matrices, and it’s so easier to work with them and store them that we benefit even despite an increase in the number of layers.
This is
popcount
comes into play. It is used to calculate the scalar product of two binary matrices:
a = xnor (x, y)
b = popcount (a)
c = len (a)
dot (x, y) = 2 × b - c
See
here and
here for more details.
Chess programming
Many chess programs store data in a
bitboard representation, which conveniently fits into a 64-bit word. The
Population Count
operation was used for meaningful operations with this view, such as calculating the
mobility of a figure.
Molecular fingerprinting
This is also related to the Hamming distance: the molecules are somehow hashed and compared (using
popcount
) to determine how similar they are. See
here for more details.
Hash array mapped tries (HAMT)
This is where I first found out about
popcount
! HAMT is a data structure (
first created by Phil Bagwell ) that can store a very large number of values (usually 32 or 64) in an array on each trie node. However, allocating memory for a 32 or 64 element array can be incredibly wasteful every time, especially if the array actually contains only a few elements. The solution is to add a bitmask in which the number of bits set corresponds to the number of elements in the array, which allows the array to grow and contract as needed. The index calculation for a given element can effectively be done using
popcount
. In
my blog post on implementing HAMT structures, you can learn more about how they work.
Compressed Data Structures
This is an exciting new area of research that focuses on how to store data in minimal space without unpacking it to do useful work. One of the methods is to think in terms of arrays of bits (bit vectors) that can be requested in two operations:
-
rank(i)
counts the number of bits given up to the i-th index in the bit vector
-
select(i)
finds the index at which the i-th bit is set
To make these operations efficient on large bit vectors, you need to build an index and use it effectively, in both cases involving
popcount
. Here is a good overview of the RRR index. And, as far as I can tell, the most advanced modern approach is described in the article
Space-Efficient, High-Performance Rank & Select Structures on Uncompressed Bit Sequences .
Compiler optimizations
popcount
has become so widespread that both
GCC and
Clang are able to detect it and replace it with a built-in instruction. Imagine this Clippy: “Oh, I see that you are trying to implement
popcount
, let me go out and fix it for you!” The corresponding LLVM code is
here . Daniel Lemyr cites it as an example of the amazing mind of modern compilers.
Output
Shrouded in mystery at the beginning of its history, the
popcount
instruction
popcount
be used everywhere, although it remained a bit unusual CPU instruction. I like the way it connects such different areas of computer science, and I wonder how many more other such strange instructions exist. If you have your own favorite, I would like to hear about her!