Prime Number Search Algorithms

“The largest prime number is 2 ^32582657 -1 . And I proudly affirm that I remembered all his numbers ... in binary form. "

Karl Pomerance

A natural number is called prime if it has only two different divisors: one and itself. The task of finding prime numbers has been haunting mathematicians for a very long time. For a long time, this problem had no direct practical application, but everything changed with the advent of public-key cryptography. This article discusses several ways to search for primes, both of purely academic interest and used today in cryptography.

Sieve of Eratosthenes

Sieve of Eratosthenes - an algorithm proposed by the ancient Greek mathematician Eratosthenes. This method allows you to find all primes less than a given number n . The essence of the method is as follows. Take a set of numbers from 2 to n . Cross out all numbers divisible by 2, except 2. From the set (we eliminate), we proceed to the next “non-eliminated” number - 3, again cross out everything that is divisible by 3. We proceed to the next remaining number - 5, and so on, until we we get to n . After performing the above steps, only prime numbers will remain in the original list.

The algorithm can be slightly optimized. Since one of the divisors of the composite number n is mandatory

l e q s l a n t s q r t n

$\ leqslant \ sqrt {n}$ , the algorithm can be stopped after deleting numbers divisible by

s q r t n

$\ sqrt {n}$ .

Illustration of the operation of the algorithm from Wikipedia:

The complexity of the algorithm is

O (n l o g l o g n)

$O (n \ log \ log n)$ at the same time, to store information about which numbers were crossed out, it is required

O (n)

$O (n)$ memory.

There are a number of optimizations to reduce these indicators. A technique called wheel factorization is to include in the initial list only numbers that are coprime with the first few primes (for example, less than 30). In theory, it is proposed to take the first simple ones until about

s q r t l o g n

$\ sqrt {\ log n}$ . This reduces the complexity of the algorithm in

l o g l o g n

$\ log \ log n$ time. In addition, so-called segmentation is used to reduce memory consumption. The initial set of numbers is divided into segments of size

l e q s l a n t s q r t n

$\ leqslant \ sqrt {n}$ and for each segment, the sieve of Eratosthenes is applied separately. Memory consumption is reduced to

O (s q r t n)

$O (\ sqrt {n})$ .

Atkin sieve

A better algorithm for filtering out composite numbers was proposed by Atkin and Bershtayn and was called the Atkin Sieve . This method is based on the following three properties of primes.

Property 1

If n is a positive number that is not a multiple of the square of a prime number and such that

n e q u i v 1 (m o d 4)

$n \ equiv 1 (\ mod 4)$ . Then n is prime if and only if the number of roots of the equation

4 x^{2} + y^{2} = n

$4x ^ 2 + y ^ 2 = n$ odd.

Property 2

If n is a positive number that is not a multiple of the square of a prime number and such that

n e q u i v 1 (m o d 6)

$n \ equiv 1 (\ mod 6)$ . Then n is prime if and only if the number of roots of the equation

3 x^{2} + y^{2} = n

$3x ^ 2 + y ^ 2 = n$ odd.

Property 3

If n is a positive number that is not a multiple of the square of a prime number and such that

n e q u i v 11 (m o d 12)

$n \ equiv 11 (\ mod 12)$ . Then n is prime if and only if the number of roots of the equation

3 x^{2} - y^{2} = n

$3x ^ 2-y ^ 2 = n$ odd.

Evidence for these properties is provided in this article .

At the initial stage of the algorithm, the Atkin sieve is an array A of size n filled with zeros. To determine prime numbers, all

x, y < s q r t n

$x, y <\ sqrt n$ . For each such pair is calculated

4 x^{2} + y^{2}

$4x ^ 2 + y ^ 2$ ,

3 x^{2} + y^{2}

$3x ^ 2 + y ^ 2$ ,

3 x^{2} - y^{2}

$3x ^ 2-y ^ 2$ and the value of the elements of the array

A [4 x^{2} + y^{2}]

$A [4x ^ 2 + y ^ 2]$ ,

A [3 x^{2} + y^{2}]

$A [3x ^ 2 + y ^ 2]$ ,

A [3 x^{2} - y^{2}]

$A [3x ^ 2-y ^ 2]$ increases by one. At the end of the algorithm, the indices of all elements of the array that have odd values are either prime numbers or squares of a prime number. At the last step of the algorithm, the squares of the numbers remaining in the set are crossed out.

It follows from the description of the algorithm that the computational complexity of the Atkin sieve and memory consumption are

O (n)

$O (n)$ . When using wheel factorization and segmentation, the complexity estimate of the algorithm is reduced to

O (n / l o g l o g n)

$O (n / \ log \ log n)$ , and memory consumption up

O (s q r t n)

$O (\ sqrt {n})$ .

Mersenne numbers and Luke-Lemer test

Of course, with such indicators of complexity, even the Atkin optimized sieve cannot be used to search for truly large primes. Fortunately, there are quick tests to check if a given number is prime. Unlike sieve algorithms, such tests are not designed to search for all primes, they are only able to tell with some probability whether a certain number is prime.

One such test method is the Luc-Lemer test . This is a deterministic and unconditional test of simplicity. This means that passing the test guarantees the simplicity of the number. Unfortunately, the test is intended only for numbers of a special kind

2^{p} - 1

$2 ^ p-1$ where p is a natural number. Such numbers are called Mersenne numbers.

The Luke-Lemer test claims that the Mersenne number

M_{p} = 2^{p} - 1

$M_p = 2 ^ p-1$ prime if and only if p is prime and

M_{p}

$M_p$ divides

(p - 1)

$(p-1)$ member of the sequence

S_{k}

$S_k$ set recursively:

S_{1} = 4, S_{k} = S_{k - 1}^{2} - 2

$S_1 = 4, S_k = S_ {k-1} ^ 2-2$ for

k > 1

$k> 1$ .

For the number

M_{p}

$M_p$ p bit length the computational complexity of the algorithm is

d i s p l a y s t y l e O (p^{3})

${\ displaystyle O (p ^ {3})}$ .

Due to the simplicity and determinism of the test, the largest known primes are Mersenne numbers. Today's largest known prime number is

2^{82, 589, 933} - 1

$2 ^ {82,589,933} -1$ , its decimal notation consists of 24,862,048 digits. You can admire this beauty here .

Fermat's theorem and Miller-Rabin test

Not many Mersenne primes are known, so public key cryptography requires a different way to search for primes. One such method is the Fermat simplicity test . It is based on Fermat’s small theorem, which states that if n is a prime, then for any a that is not divisible by n , the equality

a^{n - 1} e q u i v 1 p m o d n

$a ^ {n-1} \ equiv 1 {\ pmod {n}}$ . The proof of the theorem can be found on Wikipedia .

Fermat's simplicity test is a probabilistic test that involves enumerating several values of a if at least one of them satisfies the inequality

a^{n - 1} n o t e q u i v 1 p m o d n

$a ^ {n-1} \ not \ equiv 1 \ pmod n$ , then the number n is composite. Otherwise, n is probably prime. The more values of a used in the test, the higher the likelihood that n is prime.

Unfortunately, there are composite numbers n for which comparison

a^{n - 1} e q u i v 1 p m o d n

$a ^ {n-1} \ equiv 1 {\ pmod {n}}$ holds for all a mutually prime with n . Such numbers are called Carmichael numbers . Compound numbers that successfully pass the Fermat test are called pseudo-simple Fermat. The number of pseudo-simple Fermat is infinite, so the Fermat test is not the most reliable way to determine prime numbers.

Miller-Rabin Test

More reliable results can be achieved by combining Fermat's small theorem and the fact that for a prime p there are no other roots of the equation

x^{2} e q u i v 1 p m o d p

$x ^ 2 \ equiv 1 \ pmod p$ except 1 and -1. The Miller-Rabin test enumerates several values of a and checks that the following conditions are true.

Let p be a prime and

p - 1 = 2^{s} d

$p-1 = 2 ^ sd$ , then for any a at least one of the conditions is true:

$a ^ {d} \ equiv \ pm1 {\ pmod {p}}$
There is an integer r <s such that $a ^ {2 ^ {r} d} \ equiv -1 {\ pmod {p}}$

By Fermat's theorem

a^{p - 1} e q u i v 1 p m o d p

$a ^ {p-1} \ equiv1 \ pmod p$ , and since

p - 1 = 2^{s} d

$p-1 = 2 ^ sd$ from the property of the roots of the equation

x^{2} e q u i v 1 p m o d p

$x ^ 2 \ equiv 1 \ pmod p$ it follows that if we find a such that one of the conditions is not satisfied, then p is a composite number. If one of the conditions is fulfilled, the number a is called the witness of the simplicity of the number n according to Miller, and the number n itself is probably prime.

The more witnesses to simplicity found, the higher the likelihood that n is prime. According to Rabin’s theorem, the probability that a randomly chosen number a will witness the simplicity of the composite number is approximately

1 / 4

$1/4$ .

Therefore, if we check k random numbers a , then the probability of taking the composite number as prime

a p p r o x (1 / 4)^{k}

$\ approx (1/4) ^ k$ .

The complexity of the algorithm

O (k l o g^{3} p)

$O (k \ log ^ 3p)$ , where k is the number of checks.

Due to its speed and high accuracy, the Miller-Rabin test is widely used in the search for prime numbers. Many modern cryptographic libraries use only this test when checking large numbers for simplicity, and, as Martin Albrecht showed in his work , this is not always enough.

He was able to generate such composite numbers that successfully passed the simplicity test in the libraries OpenSSL, CryptLib, JavaScript Big Number and many others.

Luke Test and Baillie – PSW Test

To avoid vulnerabilities in situations where a composite number generated by an attacker is passed off as prime, Martin Albrecht suggests using the Baillie – PSW test. Despite the fact that the Baillie – PSW test is probabilistic, to date, no compound numbers have been found that successfully pass this test. For finding such a number in 1980, the authors of the algorithm promised a reward of $ 30. The prize has not yet been claimed.

A number of researchers checked all numbers up to

2^{64}

$2 ^ {64}$ and not a single compound number passed the Baillie – PSW test. Therefore, for numbers less

2^{64}

$2 ^ {64}$ the test is considered deterministic.

The essence of the test is to consistently check the number on a downtime by two different methods. One of these methods is the Miller-Rabin test already described above. The second is Luke's test for strong pseudo-simplicity .

Luke Strong Pseudo-Simplicity Test

Luke sequences - pairs of recurrence sequences

\ {U_ {n} (P, Q) \}, \ {V_ {n} (P, Q) \}

$\ {U_ {n} (P, Q) \}, \ {V_ {n} (P, Q) \}$ described by the expressions:

d i s p l a y s t y l e U_{0} (P, Q) = 0, q u a d U_{1} (P, Q) = 1, q u a d U_{n + 2} (P, Q) = P c d o t U_{n + 1} (P, Q) - Q c d o t U_{n} (P, Q), n g e q 0

${\ displaystyle U_ {0} (P, Q) = 0, \ quad U_ {1} (P, Q) = 1, \ quad U_ {n + 2} (P, Q) = P \ cdot U_ {n +1} (P, Q) -Q \ cdot U_ {n} (P, Q), \, n \ geq 0}$

d i s p l a y s t y l e V_{0} (P, Q) = 2, q u a d V_{1} (P, Q) = P, q u a d V_{n + 2} (P, Q) = P c d o t V_{n + 1} (P, Q) - Q c d o t V_{n} (P, Q), n g e q 0

${\ displaystyle V_ {0} (P, Q) = 2, \ quad V_ {1} (P, Q) = P, \ quad V_ {n + 2} (P, Q) = P \ cdot V_ {n +1} (P, Q) -Q \ cdot V_ {n} (P, Q), \, n \ geq 0}$

Let be

U_{n} (P, Q)

$U_n (P, Q)$ and

V_{n} (P, Q)

$V_n (P, Q)$ Are Lucas sequences, where the integers P and Q satisfy the condition

d i s p l a y s t y l e D = P^{2} - 4 Q n e q 0

${\ displaystyle D = P ^ {2} -4Q \ neq 0}$

We calculate the Jacobi symbol :

l e f t (f r a c D p r i g h t) = v a r e p s i l o n

$\ left ({\ frac {D} {p}} \ right) = \ varepsilon$ .

Find such r, s for which the equality

n - ε = 2^{r} s

$n-ε = 2 ^ rs$

For prime n , one of the following conditions holds:

n divides $U_s$
n divides $V_ {2 ^ js}$ for some j <r

Otherwise, n is composite.

The probability that a composite number n will successfully pass the Luc test for a given pair of parameters P, Q does not exceed 4/15. Therefore, after applying the test k times, this probability is

(4 / 15)^{k}

$(4/15) ^ k$ .

The Miller-Rabin and Luke tests produce disjoint sets of pseudo-simple numbers, respectively, if the number p passed both tests, it is simple. It is on this property that the Baillie – PSW test is based.

Conclusion

Depending on the task, various methods for finding prime numbers can be used. For example, when searching for large Mersenne primes, first, using the sieve of Eratosthenes or Atkin, a list of primes is determined up to a certain boundary, suppose

10^{8}

$10 ^ 8$ . Then, for each number p from the list, using the Luc-Lemer test, it is checked for simplicity

M_{p} = 2^{p} - 1

$M_p = 2 ^ p-1$ .

To generate a large prime number for cryptographic purposes, a random number a is selected and verified by the Miller-Rabin test or the more reliable Baillie – PSW. According to the prime number distribution theorem , for a randomly selected number from 1 to n, the chance of being prime is approximately equal

f r a c 1 l n n

${\ frac {1} {\ ln n}}$ . Therefore, to find a prime number of 1024 bits, it is enough to sort out about a thousand options.

PS Sources

The implementation of all the described algorithms on Go can be viewed on GitHub .

All Articles