7.3. 7.3 RSA and factoring

7.3.1. 7.3.1 RSA

The RSA cryptosystem, which is named after its inventors Ron Rivest, Adi Shamir, and Leonard Adleman [ 214 ], is the first public-key cryptosystem. It is very popular still today and is used by various cryptographic applications. Figure 7.8 shows the single steps of the RSA protocol, which we will now describe in more detail, see also Example 7.7.

Figure 7.8.  The RSA protocol.

The RSA protocol.


1. Key generation: Bob chooses two large prime numbers at random, and with , and computes their product . He then chooses an exponent satisfying

and computes the inverse of , i.e., the unique number such that

The pair is Bob's public key, and is Bob's private key.

2. Encryption: As in Section 7.1, messages are strings over an alphabet . Any message is subdivided into blocks of a fixed length, which are encoded as positive integers in -adic representation. These integers are then encrypted. Let be the number encoding some block of the message Alice wishes to send to Bob. Alice knows Bob's public key and encrypts as the number , where the encryption function is defined by

3. Decryption: Let with be the number encoding one block of the ciphertext, which is received by Bob and also by the eavesdropper Erich. Bob decrypts by using his private key and the following decryption function

Theorem 7.13 states that the RSA protocol described above indeed is a cryptosystems in the sense of Definition 7.1. The proof of Theorem 7.13 is left to the reader as Exercise 7.3-1.

Theorem 7.13 Let be the public key and be the private key in the RSA protocol. Then, for each message with ,

Hence, RSA is a public-key cryptosystem.

To make RSA encryption and (authorised) decryption efficient, the algorithm Square-and-Multiply algorithm is again employed for fast exponentiation.

How should one choose the prime numbers and in the RSA protocol? First of all, they must be large enough, since otherwise Erich would be able to factor the number in Bob's public key using the extended Euclidean algorithm. Knowing the prime factors and of , he could then easily determine Bob's private key , which is the unique inverse of , where . To keep the prime numbers and secret, they thus must be sufficiently large. In practice, and should have at least digits each. To this end, one generates numbers of this size randomly and then checks using one of the known randomised primality tests whether the chosen numbers are primes indeed. By the Prime Number Theorem, there are about prime numbers not exceeding . Thus, the odds are good to hit a prime after reasonably few trials.

In theory, the primality of and can be decided even in deterministic polynomial time. Agrawal et al. [ 2 ], [ 3 ] recently showed that the primality problem, which is defined by

is a member of the complexity class P. Their breakthrough solved a longstanding open problem in complexity theory: Among a few other natural problems such as the graph isomorphism problem, the primality problem was considered to be one of the rare candidates of problems that are neither in P nor NP-complete.

Footnote. The complexity classes P and NP will be defined in Section 8.1 and the notion of NP-completeness will be defined in Section 8.2.

For practical purposes, however, the known randomised algorithms are more useful than the deterministic algorithm by Agrawal et al. The running time of obtained in their original paper [ 2 ], [ 3 ] could be improved to meanwhile, applying a more sophisticated analysis, but this running time is still not as good as that of the randomised algorithms.

Miller-Rabin( )

  1  determine the representation , where  and  are odd   2  choose a number  at random under the uniform distribution   3  compute    4  
                        IF
                      
                        5    
                        THEN
                      
                     
                        RETURN
                      is a prime number”   6    
                        ELSE
                      
                     
                        FOR
                      
                      
                     
                        TO
                      
                        7       
                        DO
                      
                     
                        IF
                      
                        8          
                        THEN
                      
                     
                        RETURN
                      is a prime number”   9          
                        ELSE
                      
                       10             
                        RETURN
                      is not a prime number” 

One of the most popular randomised primality tests is the algorithm Miller-Rabin developed by Rabin [ 209 ], which is based on the ideas underlying the deterministic algorithm of Miller [ 182 ]. The Miller-Rabin test is a so-called Monte Carlo algorithm, since the “no” answers of the algorithm are always reliable, whereas its “yes” answers have a certain error probability. An alternative to the Miller-Rabin test is the primality test of Solovay and Strassen [ 246 ]. Both primality tests run in time . However, the Solovay-Strassen test is less popular because it is not as efficient in practice and also less accurate than the Miller-Rabin test.

The class of problems solvable via Monte Carlo algorithms with always reliable “yes” answers is named RP, which stands for Randomised Polynomial Time. The complementary class, , contains all those problems solvable via Monte Carlo algorithms with always reliable “no” answers. Formally, RP is defined via nondeterministic polynomial-time Turing machines (NPTMs, for short; see Section 8.1 and in particular Definitions 8.1, 8.2, and 8.3) whose computations are viewed as random processes: For each nondeterministic guess, the machine flips an unbiased coin and follows each of the resulting two next configurations with probability until a final configuration is reached. Depending on the number of accepting computation paths of the given NPTM, one obtains a certain acceptance probability for each input. Errors may occur. The definition of RP requires that the error probability must be below the threshold of for an input to be accepted, and there must occur no error at all for an input to be rejected.

Definition 7.14 (Randomised polynomial time) The class RP consists of exactly those problems for which there exists an NPTM such that for each input , if then accepts with probability at least , and if then accepts with probability .

Theorem 7.15 is in .

Theorem 7.15 follows from the fact that, for example, the Miller-Rabin test is a Monte Carlo algorithm for the primality problem. We present a proof sketch only. We show that the Miller-Rabin test accepts with one-sided error probability as in Definition 7.14: If the given number (represented in binary) is a prime number then the algorithm cannot answer erroneously that is not prime. For a contradiction, suppose is prime but the Miller-Rabin test halts with the output: “ is not a prime number”. Hence, . Since is squared in each iteration of the loop, we sequentially test the values

modulo . By assumption, for none of these values the algorithm says were prime. It follows that for each with ,

Since , Fermat's Little Theorem (see Corollary 7.5) implies . Thus, is a square roots of modulo . Since is prime, there are only two square roots of modulo , namely , see Exercise 7.3-1. Since , we must have . But then, again is a square root of modulo . By the same argument, . Repeating this argument again and again, we eventually obtain , a contradiction. It follows that the Miller-Rabin test works correctly for each prime number. On the other hand, if is not a prime number, it can be shown that the error probability of the Miller-Rabin tests does not exceed the threshold of . Repeating the number of independent trials, the error probability can be made arbitrarily close to zero, at the cost of increasing the running time of course, which still will be polynomially in , where is the size of the input represented in binary.

Example 7.7 [RSA] Suppose Bob chooses the prime numbers and . Thus, , so we have . If Bob now chooses the smallest exponent possible for , namely , then is his public key. Using the extended Euclidean algorithm, Bob determines his private key , and we have ; see Exercise 7.3-2. As in Section 7.1, we identify the alphabet with the set . Messages are strings over and are encoded in blocks of fixed length as natural numbers in -adic representation. In our example, the block length is .

More concretely, any block of length with is represented by the number

Since , we have

The RSA encryption function encrypts the block (i.e., the corresponding number ) as . The ciphertext for block then is with . Thus, RSA maps blocks of length injectively to blocks of length . Figure 7.9 shows how to subdivide a message of length into blocks of length and how to encrypt the single blocks, which are represented by numbers. For example, the first block, “RS”, is turned into a number as follows: “R” corresponds to and “S” to , and we have

Figure 7.9.  Example of an RSA encryption (M = Message).

Example of an RSA encryption (M = Message).


The resulting number is written again in -adic representation and can have the length : , where , see also Exercise 7.3-2. So, the first block, , is encrypted to yield the ciphertext “BAV”.

Decryption is also done blockwise. In order to decrypt the first block with the private key , compute , again using fast exponentiation with Square-and-Multiply . To prevent the numbers from becoming too large, it is recommendable to reduce modulo after each multiplication. The binary expansion of the exponent is , and we obtain

as desired.

7.3.2. 7.3.2 Digital RSA signatures

The public-key cryptosystem RSA from Figure 7.8 can be modified so as to produce digital signatures. This protocol is shown in Figure 7.10. It is easy to see that this protocol works as desired; see Exercise 7.3-2. This digital signature protocol is vulnerable to “chosen-plaintext attacks” in which the attacker can choose a plaintext and obtains the corresponding ciphertext. This attack is described, e.g., in [ 217 ].

Figure 7.10.  Digital RSA signatures.

Digital RSA signatures.

7.3.3. 7.3.3 Security of RSA

As mentioned above, the security of the RSA cryptosystem crucially depends on the assumption that large numbers cannot be factored in a reasonable amount of time. Despite much effort in the past, no efficient factoring algorithm has been found until now. Thus, it is widely believed that there is no such algorithm and the factoring problem is hard. A rigorous proof of this hypothesis, however, has not been found either. And even if one could prove this hypothesis, this would not imply a proof of security of RSA. Breaking RSA is at most as hard as the factoring problem; however, the converse is not known to hold. That is, it is not known whether these two problems are equally hard. It may be possible to break RSA without factoring .

We omit listing potential attacks on the RSA system here. Rather, the interested reader is pointed to the comprehensive literature on this subject; note also Problem 7-4 at the end of this chapter. We merely mention that for each of the currently known attacks on RSA, there are suitable countermeasures, rules of thumb that either completely prevent a certain attack or make its probability of success negligibly small. In particular, it is important to take much care when choosing the prime numbers and , the modulus , the public exponent , and the private key .

Finally, since the factoring attacks on RSA play a particularly central role, we briefly sketch two such attacks. The first one is based on Pollard's method [ 207 ]. This method is effective for composite numbers having a prime factor such that the prime factors of each are small. Under this assumption, a multiple of can be determined without knowing . By Fermat's Little Theorem (see Corollary 7.5), it follows that for all integers coprime with . Hence, divides . If does not divide , then is a nontrivial divisor of . Thus, the number can be factored.

How can the multiple of be determined? Pollard's method uses as candidates for the products of prime powers below a suitably chosen bound :

If all prime powers dividing are less than , then is a multiple of . The algorithm determines for a suitably chosen base . If no nontrivial divisor of is found, the algorithm is restarted with a new bound .

Other factoring methods, such as the quadratic sieve, are described, e.g., in [ 219 ], [ 249 ]. They use the following simple idea. Suppose is the number to be factored. Using the sieve, determine numbers and such that:

Hence, divides but neither nor . Thus, is a nontrivial factor of .

There are also sieve methods other than the quadratic sieve. These methods are distinguished by the particular way of how to determine the numbers and such that (7.10) is satisfied. A prominent example is the “general number field sieve”, see [ 160 ].

Exercises

7.3-1 a. Prove Theorem 7.13.

Hint. Show and using Corollary 7.5, Fermat's Little Theorem. Since and are prime numbers with and , the claim now follows from the Chinese remainder theorem.

b. The proof sketch of Theorem 7.15 uses the fact that any prime number can have only two square roots of modulo , namely . Prove this fact.

Hint. It may be helpful to note that is a square root of 1 modulo if and only if divides .

7.3-2 a. Let and be the values from Example 7.7. Show that the extended Euclidean algorithm indeed provides the private key , the inverse of .

b. Consider the plaintext in Figure 7.9 from Example 7.7 and its RSA encryption. Determine the encoding of this ciphertext by letters of the alphabet for each of the 17 blocks.

c. Decrypt each of the ciphertext blocks in Figure 7.9 and show that the original message is obtained indeed.

d. Prove that the RSA digital signature protocol from Figure 7.10 works.