When is it harmful to hash

Foreword
This text will be one of the rewritten chapters for the manual on information protection of the Department of Radio Engineering and Control Systems, as well as, from this training code, the Department of Information Protection of MIPT (GU). The full tutorial is available on github (see also draft releases ). On Habrir I plan to upload new "big" pieces, firstly, to collect useful comments and observations, and secondly, to give the community more overview material on useful and interesting topics.


A reliable cryptographic hash function converts plaintext to text of a given length. This ensures β€œstability”: the difficulty in restoring the first and second prototypes. Or, in plain language about the first property, the difficulty of obtaining such a text, the value of the hash function for which will be equal to the specified one.



The complexity of recovery is understood as the fact that in order to find the first prototype [a reliable cryptographic hash function], it is required to complete an average of at least 2nβˆ’1 hash operations where n - the number of bits in the output of the cryptographic hash function. Taking a modern hash function with a large output size (starting from 256 bits), the information system designer is sure that it is impossible to restore the original data from the hash function value. Most often he is right.



But there is an important set of cases when, despite the reliability of the hash function, restoring the inverse image or even the source text is not a problem. This is the case when using a hash function is pointless. This is the case when the number of variants of the source text is searchable .



Example: phone number . Different phone numbers with the prefix "+7" and 10 digits is 1010 approx233 . Modern devices that are optimized to iterate over hash values sort through millions of hashes per second. This means that the calculation of the values ​​of the hash functions for all possible phone numbers with a prefix is ​​no more than a few seconds.



Example: credit card number (PAN, payment card number ). Often the card number is masked, revealing the first 4 (6) and / or last 4 digits, while hiding the rest. There are 16 digits on the card. Is it possible to hash card numbers in order to hide them from an attacker? Not. If the attacker received 8 digits from 16 (the first 4 and last 4), as well as the value of the hash function from the full card number, then he can recover the full number in less than a second. To do this, he will need to sort out everything 108 approx226 number options.



An interesting example with an email address . It would seem that by the value of a reliable hash function, it is impossible to restore the original address. The number of different variants of 8 Latin letters and 10 digits already gives 368 approx241 options for the names of mailboxes without taking into account different domains of mail services ( @mail.ru



, @gmail.com



, etc.). If you take other allowed characters, as well as lengthen the address, there are even more options. But ... Until 2006, the Blue Frog project (Blue Frog) worked, which offered its users protection against spam. He used automatic notification of providers about spam sent from their servers, which forced advertisers to refuse to send spam to at least those addresses that were participants in the project. To understand whether the box belongs to the participant or not, a file was distributed with a list of values ​​of the cryptographic hash function from each member's mailbox address.



It was assumed that spammers will check each of their addresses to send advertisements on this list and exclude matches found. However, the presence of a file with the values ​​of the hash function allowed attackers to do exactly the opposite: identify the project participants and send reinforced flows of meaningless messages to them in order to refuse to use the Blue Frog project. Some time after this attack (as well as others, including DDoS attacks on servers), the project stopped its work.



As before, the use of any salt that comes with the value of the hash function does not affect enumeration time (but still protects against dictionary attacks).



Possible solutions for the described cases.



  1. Hash not the values ​​themselves, but the concatenation of the original value and some secret that is stored separately. For example, not in the database table (together with the values ​​of the hash functions), but in the application server configuration. With similar success, instead of hashing, you can use the block encryption function on some secret key.
  2. Use hash functions that are not only reliable, but also slow in computing. Both for the cryptanalyst and for the legal user. An example of such functions is PBKDF2, bcrypt, scrypt, Argon2, for which, when calling a function, we additionally indicate the number of hash iterations. However, if increasing the output length of the hash function by only one bit (out of 256 or 512) increases the complexity of the attack of the cryptanalyst by a binary order (by a factor of two), then increasing the number of iterations for the PBKDF2 hash function will double the attack complexity by only two times. That is, obtaining the value of the hash function even by a legal user becomes expensive in terms of computational resources and energy expended.


Afterword
The author will be grateful for factual and other comments on the text.



All Articles